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PREFACE 

The Agriculture and Resources Inventory Surveys Through Aerospace Remote 
Sensing is a multiyear program of research, development, evaluation, and appli- 
cation of aerospace remote sensing for agricultural resources, which began in 
fiscal year 1980. This program is a cooperative effort of the U.S. Department 
of Agriculture, the National Aeronautics and Space Administration, the National 
Oceanic and Atmospheric Administration (U.S. Department of Commerce), the 
Agency for International Development (U.S. Department of State), and the U.S. 
Department of the Interior. 

The work which is the subject of this document was performed by the Earth 
Resources Applications Division, Space and Life Sciences Directorate, Lyndon 8. 
Johnson Space Center, National Aeronautics and Space Administration and 
Lockheed Engineering and Management Services Company, Inc. The tasks performed 
by Lockheed Engineering and Management Services Company, Inc., were 
accomplished under Contract NAS 9-15800. 

The fo 1 lowing personnel assisted in compiling this report, in carrying out the 
tests reported here, or in providing technical inputs and consultation. These 
include H. 0. Hartley, T. H. Hughes, and R. L. Sielken of Texas A&M University; 
Project Manager J. L. Dragg (FY 1980), Experiments Manager R. 0. Hill, R. M. 
Bizzell, A. H. Feiveson, C. R. Hallum, and L. C. Wade of the National 
Aeronautics and Space Administration, Lyndon B. Johnson Space Center; and 
L. M„ Abotteen, J. E. Baird, C. L. Dailey, S. A. Davidson, and J. H. Smith of 
Lockheed Engineering and Management Services Company, Inc, 
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1. INTRODUCTION 

During the first year (fiscal year 1980) of the Foreign Commodity Production 
Forecasting (FCPF) project of the Agriculture and Resources Inventory Surveys 
Through Aerospace Remote Sensing (AgRISTARS) program, two exploratory 
experiments were performed to develop and evaluate techniques. This report 
describes the U.S. Corn and Soybeans Exploratory Experiment. The other 
experiment, the U.S. /Canada Wheat and Barley Exploratory Experiment, is 
described in the 1980 U.S. Wheat and Barley Exploratory Experiment Final 
Report (ref. 1). 

The overall purpose of the FCPF project is to develop and test procedures for 
using aerospace remote sensing technology to provide more objective, timely, 
and reliable crop production forecasting in foreign areas. To develop tech- 
nology for use in foreign areas, the FCPF project builds upon existing remote 
sensing technology and extends this technology to additional crops and regions 
(ref. 2). 
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2. SUMMARY 


2.1 PURPOSE AND SCOPE 

The overall purpose of the U.S. Corn and Soybeans Exploratory Experiment was 
to develop objective, timely, and reliable technology for production forecast- 
ing of corn and soybeans, and to conduct exploratory testing of this technol- 
ogy using data from the U.S. Corn Belt. The technology was made up of two 
sets cf procedures. One set, the classification procedures, was designed to 
separate corn and soybeans and provide proportion estimates at the level of a 
sampling unit (5- by 6-nautical-mile segment). The other set was designed to 
optimally allocate samples simultaneously for multiple crops and to make 
regional-level crop area and production estimates that make optimum use of 
available segment proportion estimates. These sets of procedures were to be 
evaluated for use as components of a baseline technology for adaptation to 
corn and soybeans production forecasting in foreign regions. The experiment 
plan for these evaluations was developed in 1979 during the transition year 
before AgRISTARS (ref. 3). 

2.2 TECHNOLOGY DESCRIPTIONS 
2.2.1 CLASSIFICATION PROCEDURES 

An analyst/computer-based technology has been developed for estimating the 
proportion of small grains and wheat area in 5- by 6-nautical-mile sample 
segments. The U.S. Corn and Soybeans Exploratory Experiment was the first 
attempt to extend segment-level proportion estimation techniques to other 
crops. The segment-level proportion estimates were obtained by labeling 
selected pixels from the segment as training for a maximum likelihood classi- 
fier. In one version of the procedure, the results from the classification 
were corrected for bias by using an independent set of labeled pixels. Pixel 
labeling was done using an objective procedure based on labeling techniques 
developed during previous experiments. This marks the first time an objective 
procedure was used to label pixels instead of relying entirely on the 
experience and insight of highly trained analysts to obtain pixel labels. 
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2.2.2 SAMPLING AND AGGREGATION PROCEDURES 


The multicrop optimum allocation procedure determines optimum sample sizes In 
strata for simultaneous estimates of one, two, or three crop categories. It 
minimizes overall sample size while maintaining sample coefficients of varia- 
tion ( C . V . ' s ) below specified levels for each crop. 

The optimal aggregation procedure uses a weighting and strata grouping scheme 
that is designed to make optimum use of available segment proportion estimates 
in combination with historical crop statistics. This procedure combines strata 
and differentially weights current proportion estimates and historical ratios 
to take account of stratum sample sizes and within-stratum variances. It is 
designed to make stable large-area aggregated estimates even when there are 
high rates of data loss and sizable proportion estimation variances. 

2.3 TEST DESCRIPTIONS AND RESULTS 

2.3.1 CLASSIFICATION PROCEDURES VERIFICATION TEST (CPVT) 

The two objectives of this test were to (1) determine the accuracy of the 
newly developed objective labeling procedure and recommend improvements and 
(2) determine the effectiveness of the maximum likelihood classification pro- 
cedure in producing corn and soybean proportion estimates. In this test, 1978 
full-season Landsat data from 25 segments distributed across the U.S. Corn 
Belt were processed. Evaluations were performed by comparing the labeling and 
classification results to digitized ground-truth crop inventories for the 
segments. 

Labeling accuracy was best on spectrally pure (Type I) dots and good on 
spectrally mixed (Type II) dots. This labeling accuracy is comparable to the 
accuracies previously achieved for small grains. Some unclear labeling 
instructions were discovered. When these were clarified in a later test, even 
better labeling accuracies were achieved. The results indicate that the corn 
and soybeans labeling procedure performs very well in the U.S. Corn Belt with 
full-season data. This procedure should be readily adaptable for subsequent 
experimentation and testing. 
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Proportion estimates produced by the machine clustering and classification 
procedure were no better than estimates made directly using Type II dots as a 
random sample. Use of the procedure resulted in underestimation of corn by an 
average of 4 percent and underestimation of soybeans by 6 percent. Alterna- 
tives to the machine processing techniques used in this experiment should be 
investigated to determine whether more effective techniques can be found. 

2.3.2 SIMULATED AGGREGATION TEST (SAT) 

The primary objective of this test was to evaluate the sampling and aggrega- 
tion components of the production estimation system. This test was a simula- 
tion test on an optimum multicrop allocation of 204 segments in the corn belt. 
Proportion estimation variances and National Oceanic and Atmospheric 
Administration (NOAA) yield model variances were taken into account in the 
allocation. Proportion estimation variances were estimated from processing 
88 segments using the corn and soybeans estimation procedure. One hundred 
simulation runs were performed in which simulated segment estimates were 
randomly designated as lost at each of five loss rates, and aggregated 
estimates of acreage and production were made. The distributions of 
aggregated estimates were compared against actual acreage and production as 
reported by the USDA. 

The simulation tests showed lhat the allocation procedure was producing esti- 
mates with CV's in good agreement with the expected value of 5 percent. The 
tests of the aggregation procedure demonstrated that the procedure introduced 
no bias into the aggregated area and production estimates for acquisition 
rates as low as 10 percent. The increase in CV's resulting from reduced 
acquisition rates were reasonably small. Estimates of CV's produced by the 
procedure correspond closely to the actual CV's of the simulated sample. The 
procedures should serve as a useful baseline component for large-area 
estimation of acreage and production in future experiments. 
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3, CLASSIFICATION PROCEDURES VERIFICATION TEST DESCRIPTION 


3.1 OBJECTIVES AND SCOPE 

The two objectives of this test were (1) to determine the accuracy of the 
newly developed objective labeling procedure and recommend improvements for 
use in the SAT, and (2) to determine the accuracy of the proportion estimation 
procedure. This test involved carrying out the procedures on a sample of test 
segments for which comparison ground-truth data were collected. 

3.2 METHOD 

3.2.1 PROCEDURE DESCRIPTION 

The procedure used to process the segments for this test is shown in figure 1. 
Using Landsat and ancillary data, an objective labeling procedure was used to 
label two sets of pixels from each segment. The major steps in the labeling 
procedure are shown in figure 2. The procedure is set up to provide increas- 
ingly more detailed labeling information at each step in the procedure. The 

first step consists of a decision tree labeling logic which is used to sepa- 

rate the pixels into cropland and noncropland. The pixels labeled cropland in 
the first step are separated into summer crops and "other crops" in the second 
step. This step also uses a decision tree labeling logic. The third step 
uses a greenness/brightness scatter plot for the separation acquisition to 
separate the summer crop pixels into corn and soybeans. Labeling methodology 
is described in a report by C. L. Dailey and K. M. Abotteen (ref. 5), which is 
included in this document as appendix B. 

The first set of analyst-labeled pixels (called Type I dots) is used as train- 
ing for a clustering algorithm which grouped all of the pixels in the segment 
into clusters on the basis of their spectral values. Each of the resulting 
clusters is labeled as corn, soybeans, or "other" using the labeled Type I dot 
closest to the mean of the cluster. On the basis of the means and variances 

for each cluster, a maximum likelihood classification of every pixel in the 

segment is performed. Using the second set of analyst labeled dots (called 
Type 2 dots) as a random sample of the segment, the proportion based on the 


3-1 


CORRECTION 


t 


PROPORTION 

ESTIMATES 


Figure l.« Diagram showing procedure for processing segment for 
the Classification Procedures Verification test. 














classification is corrected for any "bias" introduced by the classification 
process. 

3.2.2 DESIGN AND DATA SET 

The CPVT consisted of labeling and proportion estimation on 25 segments from 
four agrophysical units (APU's) in the U.S. Corn Belt using Landsat data from 
the 1978 crop year. The locations of the segments used in the CPVT are shown 
in figure 3. 

The segments in the CPVT were processed independently by three groups of 
analysts. Each segment was processed by at least two of the groups. The test 
followed a rigid experiment design so that analysis of variance techniques 
could be used to determine if the quality of the labeling and proportion esti- 
mation results were dependent on the group doing the labeling or on the APU in 
which the segment was located (ref. 6). All of the evaluations were performed 
by comparing the labeling and classification results to the digitized ground- 
truth crop inventories. 

3.3 RESULTS AND EVALUATION 

In the CPVT, statistical tests were performed to determine if there was a sig- 
nificant difference in the quality of the labeling and proportion estimation 
results due to the group performing the processing or the region in which the 
segment was located. The measures of quality used were dot labeling accuracy, 
percentage of correct classification, and proportion estimation error. A 
regional difference was observed for the dot labeling accuracy for soybeans. 
The labeling of soybeans was significantly less accurate in a predominantly 
corn-producing region than in the regions where soybeans were more prevalent. 

A group effect was found in the dot labeling accuracy fcr corn. One group 
produced significantly more accurate dot labeling for corn. Investigation 
showed that the difference was due to a difference in the way the group placed 
the separation line on the scatter plots for corn and soybeans. 

The labeling accuracies for the CPVT are shown in table 1. The labeling 
accuracy is comparable to the small -grains labeling accuracies previously 
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Figure 3.- Map showing locations of the segments used in the 
Classification Procedures Verification test. 
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achieved during the Large Area Crop Inventory Experiment (LACIE). The label- 
ing for Type I dots was better than for Type II dots. This difference results 
from the fact that the Type I dots are required to be spectrally pure, while 
the Type II dots can be spectrally mixed. It is, therefore, natural to expect 
better labeling accuracy on dots which are representative of a particular 
crop, rather than a mixture of signatures from more than one crop. 

The proportion estimation errors as a function of the true proportion are 
shown for both corn and soybeans in figure 4. The average proportion of corn 
in the segments was 38 percent. The machine processing procedure underesti- 
mated the corn proportion by an average of 4 percent. The average proportion 
of soybeans was 28 percent. The procedure underestimated the soybeans propor- 
tion by 6 percent. All of the bias and half of the variability in the propor- 
tion estimation errors were the result of dot labeling errors. The proportion 
est mates produced by the procedure were not any better than estimates obtai- 
ned by using the Type II dots as a random sample. Therefore, the machine 
processing (i.e., clustering and classification) did not improve the results. 

Since the labeling and classification accuracies were much better for spec- 
trally pure pixels than for mixed pixels, a study was made on the segments in 
this test to determine if accurate proportion estimates could be obtained from 
classification information for spectrally pure pixels. In order to perform 
the study, analysts assigned each of the pure pixels with its ground-truth 
label, and a proportion estimate was made using only these pixels. Figure 5 
shows the proportion estimation errors for two criteria for pixel purity. 
Pixels which meet the "one-half pixel" purity criterion are at least one-half 
pixel from the field boundaries. Pixels which meet the "one pixel" criterion 
are at least one pixel from the field boundaries. The results indicate that 
proportion estimates based only on pure pixels can be biased and have a great 
deal of variability. In the data set used in this test, the corn estimates 
showed a positive bias. 

This test is described in detail in a report by J. 6. Carnes and J. E. Baird 
(ref. 4), which is included in this document as appendix A. 
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TABLE 1,- SUMMARY OF DOT LABELING RESULTS FOR THE 
CLASSIFICATION PROCEDURES VERIFICATION TEST 


Ground- truth 
category 

Percent correctly labeled 

Type 1 dots 

Type 2 dots 

Corn 

83 

73 

Soybeans 

79 

64 

Other 

93 

86 

All categories 

86 

75 
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Figure 4.- Proportion estimation errors as a function of the 
true proportion for both corn and soybeans. 
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4. SIMULATED AGGREGATION TEST DESCRIPTION 


4.1 OBJECTIVES AND SCOPE 

This test was accomplished in two studies. The first study involved propor- 
tion estimation of corn belt segments to provide estimates of variability of 
segment proportion estimates and to evaluate the classification procedures as 
they were modified following the CPVT. This study is described in a report by 
S. A. Davidson (ref, 7), which is included in this document as appendix C. 

The second study was the simulation study that used the proportion estimation 
variances derived in the first study. The objectives of the simulation study 
were to (1) verify that the optimum multicrop sample allocation procedure pro- 
vided correct sample allocations among the strata, (2) validate the new aggre- 
gation and variance estimation logic, and (3) determine the robustness of the 
procedure under random nonresponse. This study is described in a report by 
J.H. Smith (ref. 8), which is included in this document as appendix D. 

4.2 METHOD 

4.2.1 PROCEDURE DESCRIPTIONS 

The labeling procedure used in the SAT was essentially the same as that used 
in the CPVT. The changes made as a result of the CPVT were mainly improve- 
ments in the clarity of the procedure. The proportion estimation procedure 
was modified from the procedure used in the CPVT. On the basis of a study 
performed by the Supporting Research project of the AgRISTARS program 
(ref. 9), the objective of providing estimates of variability of segment pro- 
portions and resource considerations, the decision was made not to perform the 
bias correction on the initial proportion estimates in the SAT. Therefore, 
the proportion estimation procedure involved labeling of the Type I dots, 
classification of the segment, and proportion estimation by enumeration of 
pixels in the class of interest. 

The multi crop allocation procedure tested in the second part of the SAT formu- 
lates the allocation problem in terms of nonlinear programming. The sample 
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size is minimized using a Lagrangian Multiplier technique, subject to the 
constraints that the sample C.V.'s for each crop not exceed a given value 
(ref. 10). 

The aggregation procedure tested in the second part of the SAT is shown in 
figure 6. It consists of a technique for using historical data to compensate 
for the loss of data in a particular stratum (ref. 11). The technique 
involves a weighting procedure which places more reliance on historical data 
as the classification results become less reliable because of data loss or 
errors in the classification results. 

4.2.2 DESIGN AND DATA SET 

The 88 segments in the SAT were each processed once. Twenty-three of the seg- 
ments had been processed in the CPVT. These were processed in the SAT, but by 
a different analyst group. Thirty-five additional segments with ground-truth 
inventories were processed and used in the evaluations. For 30 segments no 
ground-truth data were available. The locations of the segments used in this 
test are shown in figure 7. Evaluations of the labeling and proportion 
e^imation accuracies were performed using the segments for which ground-truth 
information was available. 

The simulation test of the aggregation procedure was performed by setting up 
an allocation of 204 simulated segments in 12 strata in the states of 
Illinois, Indiana, and Iowa. Historical data were used to determine the mean 
crop proportions within strata. The distribution of segment proportions was 
determined from the historical variability and from the empirical variances 
observed in the classification results. State-level historical data were used 
to determine mean yields, and the distribution of yield estimates was 
determined using NOAA yield model variance. 

A Monte Carlo simulation was performed in which segments ware randomly desig- 
nated as "lost". For each loss rate, 100 simulations were performed to obtain 
aggregated estimates of production. 
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4.3 RESULTS AND EVALUATION 


In the SAT, the labeling accuracy was better than the accuracy in the CPVT. 
Table 2 shows a comparison of the labeling accuracies in the two tests. The 
improvement in the labeling accuracy for the second test was due to changes in 
the labeling procedure recommended on the basis of the first test and to ■'n 
improved procedure for selecting acquisitions. 

The proportion estimation results for the SAT are shown in figure B. The 
results for soybeans proportion estimation were comparable to those obtained 
in the CPVT, The average soybeans proportion in the segments was 30 percent. 
The procedure underestimated the soybeans proportion by an average of B per- 
cent, For corn, the average proportion was 41 percent. In the SAT, the pro- 
cedure overestimated the corn proportion by 5 percent, while in the CPVT, the 
proportions were underestimated by 4 percent. The change in bias between the 
two tests is due to the fact that a bias correction was not performed in the 
SAT, The classification procedure was trained using only spectrally pure 
pixels. When only pure pixels are used in training, a classification is pro- 
duced which is representative of the pure areas of the segment, rather than of 
the entire segment. As the pure pixel studies showed, this will produce a 
positive bias in the classification results. 

The simulation tests of the sampling and aggregation procedures were set up to 
provide large area production estimates with a CV of 5 percent for both corn 
and soybeans at a 100 percent acquisition rate. The aggregation procedure was 
tested to determine if the CV estimates computed by the procedure were 
correct, if any bias was introduced into the aggregated estimates because of 
nonresponse, and if the CV's at reduced response rates were reasonable. 

The simulation tests showed that the allocation procedure was producing esti- 
mates with CV's in good agreement with the expected value of 5 percent (CV = 
4.7 percent for corn and CV = 5.2 percent for soybeans). The tests of the 
weighted aggregation procedure demonstrated that the procedure introduced no 
bias into the aggregated area and production estimates for acquisition rates 
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TABLE 2.- COMPARISON OF LABELING ACCURACY 
FOR CPVT AND SAT TESTS 


Percent correctly labeled 

CPVT 

SAT 

(Type I dots) 

• 

86 

93 

79 

88 

93 

96 

86 
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Figure 8.- Summary of results for the Simulated Aggregation test. 
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as low as 10 percent. Figure 9 shows the CV's resulting from reduced acquisi- 
tion rates for area and for production. These variances are reasonable, and 
the average C V estimates produced by the procedure correspond closely to the 
CV's of the simulated sample. 
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5. CONCLUSIONS AND RECOMMENDATIONS 




The results from the labeling evaluations indicate that the corn/soybeans 
labeling procedure perforins very well in the U.S. Corn Belt with full-season 
(after tassellng) Landsat data. The procedure should be readily adaptable to 
co^n/soybeans labeling required for subsequent exploratory experiments or pilot 
tests . 

The machine classification procedures evaluated in this experiment were not 
effective in improving the proportion estimates. The corn proportions produced 
by the machine procedures had a large bias when the "bias" correction was not 
performed. This bias was caused by the manner in which the machine procedures 
handled spectrally impure pixels. Alternatives to the machine processing tech- 
niques used in this experiment should be investigated to see if more effective 
techniques can be found. 

The simulation test indicated that the weighted aggregation procedure performed 
quite well. Although further work can be done to improve both the simulation 
tests and the aggregation procedure, the results of this test show that the 
procedure should serve as a useful baseline procedure in future exploratory 
experiments and pilot tests. 
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PREFACE 

The investigation which is the subject of this document was undertaken in sup- 
port of the Foreign Commodity Production Forecasting project of the Agricul- 
ture and Resources Inventory Surveys Through Aerospace Remote Sensing program. 
Under Contract NAS 9-15800, scientists of Lockheed Engineering and Management 
Services Company, Inc., evaluated the results which are reported for the Earth 
Observations Division, Space and i.ife Sciences Directorate, of the National 
Aeronautics and Space Administration, Lyndon B. Johnson Space Center. 
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PRECEDING 


page blank not FILLED 


1. INTRODUCTION 

The purpose of the U.S. Corn and Soybeans Exploratory Experiment — Classifi- 
cation Procedures Verification Test was to evaluate the performance of the 
adapted Large Area Crop Inventory Experiment (LACIE) Transition Year (TY) 
classification procedure for corn and soybeans. See reference 1 for a 
discussion of the procedure used in this test. In this test, 25 segments 
selected from four agrophysical units (APU's) were processed by three groups 
of analysts. Analysis of variance techniques were used to determine the 
factors which were important to the quality of the classifications per- 
formed. The factors evaluated were group effects and APU effects. The 
classification results were evaluated to determine the effectiveness of the 
procedure in producing corn and soybeans proportion estimates. 
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2. FACTORS AFFECTING THE QUALITY OF THE CLASSIFICATIONS 

The segments used in this test were from APU's 14, 24, 25, and 28 located in 
Missouri, Iowa, Illinois, and Indiana* Because APU 24 had a small number of 
segments and APU's 14 and 24 were reasonably similar, APU's 14 and 24 were 
merged and designated APU 14 for evaluation purposes. 

Three groups of analysts processed the segments. Group I processed 19 of the 
segments, whereas groups II and III each processed 18 segments. The alloca- 
tion of the segments among the groups and APU's Is shown In table 1. The 
linear model and related assumptions used in the analyses of variance are des- 
cribed in reference 2. 

The following measures of classification quality were used in the analyses of 
variance: 

a. Proportion estimation error 

b. Percentage of picture elements (pixels) correctly classified 

c. Reduction in the expected proportion estimate variance if a bias correc- 
tion were applied to the classification results 

d. Analyst dot labeling accuracy 

The factors were tested for their effects in the following order: first, 

interaction between groups and APU ' s j second, group effects; and, third, APU 
effects. If a significant result was obtained at one stage, it was impossible 
to test for significant results at a later stage. 

Table 2 shows the average proportion estimation error and average absolute 
proportion error for corn and soybeans by group and by APU. Significant dif- 
ferences are indicated by numbers in parentheses following the values. No 
significant effects were found in the results for corn. For soybeans, a sig- 
nificant difference in the proportion errors was found between groups II and 
III. The absolute proportion error was significantly different for APU 14. 
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TABLE I.- DISTRIBUTION OF SEGMENTS BY GROUP AND BY APU 

[Parentheses indicate processed data which were not used 
in the analyses of variance!) 


Segment 


APU 

number 

Group I 

Group II 

Group III 

14 

135 

X 


X 


202 

(X) 

(X) 

(X) 


864 


X 

X 


865 

X 


X 


877 

X 

X 



880 


X 

X 


881 

X 

X 

(X) 


882 

(X) 

(X) 

(X) 

25 

107 

X 

X 



141 

X 


X 


144 


X 

X 


205 

X 

X 



800 

(X) 

(X) 



807 


X 

X 


809 

X 


X 

28 

123 

X 

X 



127 

(X) 

(X) 

(X) 


133 


X 

X 


832 

X 


X 


837 

(X) 

(X) 

(X) 


842 

X 

X 



843 

(X) 

(X) 



852 

X 


X 


853 

(X) 


(X) 


860 


X 

X 
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TABLE 2.- PROPORTION ESTIMATION ERRORS 

[Significant differences are indicated by number 
in parentheses following the values] 



Corn 

Soybeans 

Average 
error, % 

Average 
absolute 
error, % 

Average 
error, % 

Average 
absolute 
error, % 

Group I 

- 6.3 

7.4 

"~-6.6 

7.4 

Group II 

-3.1 

8.1 

-9.0(1) 

9.0 

Group III 

-4.8 

7.1 

-4.0(1) 

7.0 

APU 14 

-5.8 

7.4 

-2.3 

4*5(2) (3) 

APU 25 

-3.6 

5.9 

-7.3 

9.0(2) 

APU 28 

-4.8 

9.3 

-9.9 

9.9(3) 

Overall 

-4.7 

7.5 

1 

-6.5 

7.8 
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The results for the percentage of pixels correctly classified are shown in 
table 3. An interaction between groups and APU's for the percentage of cor- 
rect classification (PCC) for class "other" made it impossible to determine 
group and APU effects for the PCC for "other." The only significant result 
was a group effect for the PCC for corn, where the group III result was sig- 
nificantly different from the group I and II results. 

The results of reductions in variance are shown in table 4. In analyzing the 
results for corn, a significant interaction between groups and APU's made it 
impossible to test for group and APU effects individually. There were no sig- 
nificant effects for soybeans. 

Tables 5 and 6 show the dot labeling accuracy for type 1 and type 2 dots. 

There were group effects for the type 1 dot labeling accuracy for corn and for 
the overall category. In both cases, group III was significantly different 
from groups I and II. A significant APU effect was sho ' the labeling 
accuracy for class "other" in both the type 1 and type 2 T n both cases, 

APU 14 was significantly different from APU's 25 and 28. 

In summary, the observed group effects involved dot labeling accuracy and PCC 
for corn. In both cases, group III was consistently less accurate than 
groups I and II. Since all three groups were given the same training and were 
to follow the same procedures, it would appear that there was some misunder- 
standing of- the procedure for corn by group III. 

The observed APU effects involved dot labeling accuracy and proportion estima- 
tion error for soybeans. In both cases, APU 14 had less accurate results than 
APU's 25 and 28. It appears that dot labeling for soybeans is more difficult 
in APU 14. It is interesting to note that, although the dot labeling for 
type 1 dots showed a significant difference, the PCC for the classifications 
based on these dots did not show a significant difference. 
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TABLE 3.- PERCENTAGE OF PIXELS CORRECTLY CLASSIFIED 

[Significant differences are indicated by number 
in parentheses following the values] 


Group I 
Group II 
Group III 
APU 14 
APU 25 
APU 28 

Overall 


Corn PCC Soybeans PCC "Other" PCC Overall PCC 


73.2(1) 
75.6(2) 
62.6(1 ) (2) 

77.8 

69.9 
63.6 


TABLE 4.- PERCENTAGE OF REDUCTION IN VARIANCE EXPECTED IF BIAS 
CORRECTION IS PERFORMED ON CLASSIFICATION RESULTS 



Corn 

Soybeans 

Group I 

61.0 

53.2 

Group II 

62.8 

59.3 

Group III 

61 .6 

59.5 

APU 14 

58.9 

55.4 

APU 25 

62.2 

59.2 

APU 28 

64.3 

57.4 

Overall 

61.8 

57.3 

















TABLE 5.- TYPE 1 OCT LABELING ACCURACY 


[PCL ° percentage of dots correctly labeled; significant differences 
are indicated by the number in parentheses following the values] 



Corn PCL 

Soybeans PCL 

"Other" PCL 

Overall PCL 

Group I 

88.3(1) 

79.9 

89.5 

86.8(3) 

Group II 

89.2(2) 

76.2 

88.3 

86.8(4) 

Group III 

67.0(1) (2) 

66.1 

85.8 

7/.8(3)(4) 

APU 14 

83.5 

83.3 

76.9(5) (6) 

83.5 

APU 25 

85.9 

65.1 

89.6(5) 

82.7 

APU 28 

75.1 

73.8 

97.1(6) 

85.1 

Overall 

81.5 

74.1 

87.9 

83.8 


TABLE 6.- TYPE 2 DOT LABELING ACCURACY 

[PCL = percentage of dots correctly labeled; significant differences 
are indicated by the number in parentheses following the values] 



Corn PCL 

Soybeans PCL 

"Other" PCL 

Overall PCL 

Group I 

66.9 

70.4 

85.9 

74.9 

Group II 

70.5 

60.6 

86.5 

74.3 

Group III 

64.5 

61.1 

80.7 

70.9 

APU 14 

70.8 

72.8 

76.6(1) (2)* 

73.6 

APU 25 

70.7 

61.8 

89.3(1) 

76.3 

APU 28 

60.5 

57.5 

87.2(2) 

70.3 

Overall 

67.3 

64.0 

84.4 

73.4 




3. CLASSIFICATION PROCEDURE EVALUATION 


In order to determine the effectiveness of the classification procedure in 
producing proportion estimates, the various stages in the classification pro- 
cedure must be investigated. One way of doing this is to calculate proportion 
estimates based only on the information available at a particular stage. By 
comparing the accuracy at the different stages, one can determine which steps 
are necessary and which steps are not. 

The classification procedure consists of the following steps: 

a. Two sets of dots are labeled as corn, soybeans, or "other" by the analyst. 

b. Using one set of analyst-labeled (type 1) dots as seed pixels, all pixels 
in the segment are grouped into clusters on the basis of their spectral 
values. 

c. Each of the clusters is labeled as corn, soybeans, or "other" by the 
analyst-labeled type 1 dot closest to the mean of the cluster. 

d. On the basis of the means and variances for each cluster, every pixel in 
the segment is classified as corn, soybeans, or "other." 

e. Using the second set of analyst-labeled (type 2) dots as a random sample 
of the segment, the proportions based on the classification are corrected 
for any bias introduced by the classification process. 

Proportion estimates can be calculated at the following four stages in the 
classification procedure: 

a. At the dot labeling stage, the type 2 dots can be aggregated on the basis 
of their labels to determine a proportion. 

b. At the clustering stage, a proportion can be determined by aggregating the 
pixels in a cluster on the basis of the label assigned to the cluster. 

c. At the classification stage, a proportion can be determined by aggregating 
the pixels on the basis of the labels assigned by the classifier. 

d. At the bias-correction stage, the final estimate produced by the procedure 
can be used. 
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The set of classifications used In this evaluation is listed in table 1. For 
the purposes of evaluating the classification process, five of the cl ass i f i ca 
tions were not used: 382 and 127 by group I; 881 by group II; 837 and 860 by 
group III. Eliminating these classifications resulted in each segment being 
represented twice by two different groups. Groups I and II were represented 
17 times each, whereas group III was represented 16 times. 

Although it is possible to determine a proportion at the clustering stage, 
clustering proportions are not presented. The cluster-based proportions are 
not included because the cluster and classification proportions are essen- 
tially identical. Figure 1 shows the classification proportions P(CLS) as a 
function of the cluster proportions P(CLU) for the segments involved in this 
evaluation. The linear regressions shown in the figure indicate an almost 
perfect correlation between the two proportion estimates (R 2 = 0.99907). 
Therefore, proportion estimates are calculated for the type 2 dots, clas- 
sification, and bias-correction stages. 

Figure 2 shows the errors in the proportion estimates as a function of the 
true proportion. The mean error, standard deviation, and mean square error 
for each estimator are presented in table 7 (page 3-7). The mean error is a 
measure of the bias in the estimator. The standard deviation is a measure of 
the estimator's variability. The mean square error is an indication of the 
overall performance of the estimator. 

The mean error for corn was negative at the dot labeling and bias-correction 
stages and positive at the machine classification stage. The mean square 
errors were nearly the same at the dot labeling and bias-correction stages. 
This indicates that the machine processing did not improve the proportion 
estimate. The type 2 dots produced as good an estimate by themselves as 
when they were used to establish a bias-correction factor for the machine 
classification. 
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Figure 1.- Comparison between classification proportions and cluster proportions 




Figure 2.- Proportion estimates using analyst labeling as input. 




Figure 2.- Concluded. 





Figure 3.- Proportion estimates using ground-truth labeling as input. 







TABLE 9.- U.S. CORN AND SOYBEANS EXPLORATORY EXPERIMENT — CLASSIFICATION 
ERRORS USING GROUND-TRUTH LABELS AS INPUT 



Corn 

Soybeans 

Source of 
classification 

Mean 

error 

Standard 

deviation 

Mean 

square 

error 

Mean 

error 

Standard 

deviation 

Mean 

square 

error 

Type 2 dots as 
random sample 

1,55 

5,19 

28,3 

1.00 

4.14 

17.5 

Machine classification 

8.21 

8.98 

144.7 

-2.28 

5.63 

35.6 

Bias-corrected 
machine classification 

1.00 

4.07 

17.0 

0.47 

3.08 

9.3 


TABLE 10.- U.S. CORN AND SOYBEANS EXPLORATORY EXPERIMENT - CLASSIFICATION 
IMPROVEMENT USING GROUND-TRUTH LABELS AS INPUT 


Classification 
sources compared 

Corn 

Soybeans 

Processing 
improved, % 

Mean 

improvement 

Processing 
improved, % 

Mean 

improvement 

Machine classification 
vs. type 2 dots 

20 

-5.05 

36 

-1.26 

Bias correction vs. 
machine classification 

76 

5.70 

76 

2.24 

Bias correction vs. 
type 2 dots 

60 

0.65 

64 

0.98 
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Improvement Is not great enough to warrant the effort Involved In performing 
the machine classification. 

The most interesting feature of the ground-truth-based classification results 
is the large mean error in the machine classification proportions for corn. 

The plot in figure 3 shows that the error increases with increased true pro- 
portion. In fact, the mean square error of 144.7 (table 9) is larger than the 
mean square error of 103.8 for the analyst-based machine classification 
results (table 7). This indicates a serious problem with the procedure, since 
one would expect the results to improve or remain the same when true labels 
are substituted for analyst labels. 

A possible source for the bias could be that the type 1 dots, used as input 
for the classification, are not representative of the entire segment. In 
order to determine if the type 1 dots are representative of the segment as a 
whole, a proportion estimate can be calculated using the type 1 dots as a ran- 
dom sample of the segment. If the type 1 dots are representative of the seg- 
ment, the estimate should be unbiased. Figure 4 shows the proportion estima- 
tion error for the type 1 dots. As one might expect, the corn estimate has an 
8.48-percent positive bias. This is very close to the bias of 8.21 percent in 
the classification estimate. The type 1 dot estimate shows the same trend as 
the classification estimate. Therefore, the type 1 dots are not representa- 
tive of the segment, which is responsible for the bias in the classification 
results. 

The question to consider now is: Why are the type 1 dots a biased sample of 
the segment? These dots are a set taken from a random grid; thus, the loca- 
tion should not produce a bias. One restriction was placed on the dots: that 

a dot which falls on a field boundary is not used. In this particular test, 
type 1 dots were used only if they were more than one-half pixel away from a 
field boundary. If the proportion is calculated using all of those pixels 
which meet the purity criterion and this estimate is biased with respect to 
the true proportion, then the purity restriction on the type 1 dots is the 
source of the observed bias. Figure 5 shows errors in the proportions based 
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on all pure pixels in the segment as a function of the true proportion. The 
proportion errors for corn show the same trend to greater error with increased 
proportion, as seen in the type 1 dot proportion and classification results. 
The mean error for corn is 7.61 percent, which is consistent with the errors 
observed for the type 1 dot and classification estimates. 

The conclusion from this analysis would be that the type 1 dots are more 
representative of the pure pixels in the scene than of the entire scene. 

Since the pure pixels are a biased sample of the segment, the proportions 
based on the type 1 dots and on the classification will also be biased. One 
way of verifying this conclusion is to compare the proportion estimates with 
the ground-truth proportions based on pur;e pixels. If the mean error, stand- 
ard deviation, and mean square error are less when the pure pixel ground-truth 
proportion is used rather than the entire scene ground-truth proportion, then 
the proportions are more representative of the pure pixels than of the entire 
scene. Figure 6 shows the results of these comparisons. The corn estimates 
do not show the large positive bias evident when the entire scene proportion 
is used as the true proportion. The mean errors, standard deviations, and 
mean square errors corresponding to figure 6 are presented in table 11. The 
mean errors for the corn estimates are reduced from more than 8 percent to 
less than 1 percent. There was a slight reduction in the standard deviation. 
The mean squarp error was reduced by 50 percent or more. The results for soy- 
beans were not as straightforward as those for corn. Although the mean square 
error for the type 1 dots decreased slightly when pure pixel proportions were 
used, the mean square error for the classification actually increased. These 
changes are not significant because the pure pixel and entire scene ground- 
truth proportions were close. 

The bias and about one-half of the variability in the proportion estimates are 
the result of analyst dot labeling errors. A summary of the analyst dot 
labeling accuracy is shown in tables 12 and 13. The overall accuracy for 
type 1 dot labeling was 86 percent, whereas the accuracy for type 2 dot label- 
ing was 75 percent. This is probably a consequence of the fact that all of 
the type 1 dots were pure, whereas type 2 dots could be impure. One can 
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Figure 6.- Proportion estimate error using pure pixel proportions as true proportions 
(based on ground-truth labels for dots) versus pure pixel proportions. 






TABLE 11.- EFFECT OF USING PURE PIXEL GROUND-TRUTH PROPORTIONS 

ON CLASSIFICATION ERRORS 


Crop 

Source of 
classification 
estimate 

Source of 
ground-truth 
proportion 

Mean 

error 

— 

Standard 

deviation 

Mean 

square 

error 

Corn 

Type 1 dots as 
random sample 

Entire scene 

8.48 

13.19 

238.9 

Pure pixels 

.93 

10.69 

110.6 

Machine 

cl assification 

Entire scene 

8.21 

8.98 

144.7 

Pure pixels 

.66 

7.32 

51.9 

Soybeans 

Type 1 dots as 
random sample 

Entire scene 

.96 

8.38 

68.4 

Pure pixels 

-1.18 

6.97 

48.0 

Machine 

classification 

Entire scene 

-2.28 

5.63 

35.6 

Pure pixels 

-4.41 

4.93 

42.8 
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TABLE 12.- DOT LABELING ACCURACY FOR TYPE 1 DOTS 


Crop 

Dots 

labeled 

corn 

Dots 

labeled 

soybeans 

Dots 

labeled 

"other" 

Dots 

correctly 
labeled, % 

Corn 

647 

34 

71 

86 

Soybeans 

54 

392 

52 

79 

"Other" : 





Wheat 

3 

0 

23 

88 

Oats 

1 

0 

8 

89 

Grass 

0 

1 

7 

88 

Hay 

3 

2 

40 

89 

Pasture 

7 

1 

138 

95 

Trees 

6 

1 

142 

95 

Clover 

0 

0 

9 

100 

Vegetable 

0 

0 

2 

100 

Water 

0 

0 

14 

100 

Nonagriculture 

1 

3 

41 

91 

Homestead 

1 

0 

27 

96 

Idle 

3 

2 

35 

88 

Total "other" 

25 

10 

486 

93 













TABLE 13.- DOT LABELING ACCURACY FOR TYPE 2 DOTS 


Total "other" 


Dots 

labeled 

"other" 

Dots 

correctly 
labeled, % 

456 

73 

341 

64 

93 

81 

64 

79 

22 

71 

124 

90 

421 

87 

343 

93 

5 

45 

9 

100 

35 

95 

131 

86 

95 

88 

119 

78 

1461 

86 
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explain the fact that the soybean proportion estimates based on classification 
results were better than those based on the type 2 dots wh/.i analyst labels 
were used. Although the classification estimates are usually less accurate, 
the better labeling for the type *1 dots was enough to improve the classifica- 
tion results. In looking at the confusion between the „';tegor!es (corn, soy- 
beans, and "other"), it appears that there is greater confusion between corn 
and "other" than between corn and soybeans. 

In order to determine how well the clustering algorithm is working in separat- 
ing the crop of interest from a noncrop, the cluster purities were calculated 
for corn and for soybeans. Histograms of cluster purity are shown for corn 
and soybeans in figures 7 and 8. The number of clusters with given crop pro- 
portions is plotted as a function of the crop proportion. Ideally, these his- 
tograms should show two maxima (at 0 percent and 100 percent) representing 
pure noncrop and crop clusters. The histogram should be zero at the center. 

In the figures, one does see the expected two maxima with a minimum of approx- 
imately 50 percent. The crop maximum is fairly broad, but it appears that + he 
clustering algorithm is separating crop and noncrop pixels to a certain 
extent. 
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4, SUMMARY OF RESULTS 


Based on the studies presented in this document, the following conclusions can 

be reached: 

a. The proportion estimates for corn had a bias of -4 percent with a standard 
deviation of 8 percent. 

b. The proportion estimates for soybeans had a bias of -6 percent with a 
standard deviation of 7 percent. 

c. The bias and about one-half the standard deviation for both corn and 
soybeans were the result of dot labeling errors. 

d. Proportion estimates based on the type 2 dots as a random sample are as 
good as the final bias-corrected results. 

e. The machine classification results are identical to the machine clustering 
results. 

f. The large bias observed in the classification proportions for corn (when 
true labels are used) is caused by bias in the type 1 dots used as input 
to the classification procedure. 

g. The bias in the type 1 dots was present because the type 1 dots were 
required to be pure. 

h. Although the three groups used to process the segments were given identi- 
cal training and used identical procedures, one group had significantly 
different dot labeling accuracy. 

i. It is more difficult to label "other" dots in APU 14 than it is in 
APU's 25 and 28. 
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5. RECOMMENDATIONS 

Dot labeling errors are the greatest source of error in the proportion esti- 
mates. If the quality of the proportion estimates is to be improved, the cur- 
rent dot labeling techniques need to be improved or an alternative for dot 
labeling found. 

Since the machine processing used in this test does not significantly improve 
the accuracy of the corn and soybeans proportion estimates, the prooortion 
estimates can be made using the labeled dots as a random sample of the 
segment. Alternatives to the machine processing technique used In this test 
should be investigated to see if a more effective technique can be found. 

Since the maximum likelihood classification results are identical to the 
results using labeled clusters, it is not necessary to perform the maximum 
likelihood classification. The proportion estimates based on the clustering 
results should be bias corrected using a random dot set so that the kind of 
bias reflected in the corn proportion estimates can be reduced. 
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PREFACE 

This report offers a detailed description of the decision logic and procedure 
developed for identification of corn and soybeans in the U.S. Corn Belt. 
Development and testing of the procedure are outlined and a summary of 
significant results is presented. 

The development and testing of the corn/soybean decision logic procedure was a 
team effort which required the expertise of many individuals. The major 
effort of designing the hierarchical structure of the decision logic was 
coordinated by W. P. Palmer, who documented the initial decision logic in an 
internal communication (section 5). Major sections of that document are 
reproduced in this report. J. D. Nichols and W. L. West analyzed image and 
ground-truth data and constructed the cropland Identification step of the 
decision logic. T. E. Johnson, B., B. Schroder, and R. D. Pickerel developed 
the initial framework for the separation of corn and soybeans using image 
products of the Large Area Crop Inventory Experiment. W. W. Austin aided in 
the analysis of spectral aids. These individuals were major contributors to 
the development of the corn/soybean decision logic. 

The authors would like to thank the analysts from both the National Aero- 
nautics and Space Administration and Lockheed Engineering and Management 
Services Company, Inc. who participated in the tests. Also, the authors wish 
to thank J. G. Carnes for the preliminary test results which appear in this 
paper. 
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1. INTRODUCTION 

This paper shows the development and testing of an analysis procedure which 
was developed to improve the consistency and objectivity of crop identifica- 
tion using Landsat data. The procedure was developed to identify corn and 
soybean crops in the U.S. Corn Belt region. The procedure consists of a 
series of decision points arranged in a tree-like structure, the branches of 
which lead an analyst to crop labels. The specific decision logic is designed 
to maximize the objectivity of the identification process and to promote the 
possibility of future automation. 

In prior procedures, the interpretation function was more loosely structured 
and many steps were very subjective. The analyst was responsible for accumu- 
lating information from various sources, assimulating and integrating the 
information in order to determine the most likely label for a signature. 
Labeling accuracies of these procedures were related to the experience of the 
analyst, and labeling errors were sometimes hard to diagnose. 

This decision logic is a hierarchy of decisions that uses a step-by-step pro- 
cedure to lead the analyst from general major land-use categories to the 
specific identification of corn and soybean signatures. In the first step, 
analysis of the signatures on the imagery is governed by answers given at 
decision points on the decision tree ano >sults in the differentiation of 
cropland from other major land-use categories. In step two, image products 
are used to answer more specific questions to separate cropland into summer 
and nonsummer crops. In step three, summer crops are identified as definite 
corn and soybeans through the aid of numerical spectral information in graphic 
form. Any remaining signatures are labeled in step four by comparing them to 
definite corn and soybean profiles and choosing the label of the most similar 
profile. Each component of the decision logic will be further discussed in 
terms of its function, strengths, and weaknesses. 


1-1 


B- 15 


Two tests were performed to evaluate the decision logic. Labeling accuracies 
pertaining to the developmental task are summarized, and procedural problems 
and recommendations are discussed in this paper. The complete analysis of the 
accuracy of the tests is contained in an accuracy assessment report (ref. 1). 


2. OBJECTIVES 


This research effort was designed to develop and test a decision logic for 
corn and soybean identification. The objectives of the effort were to 

• Define a tree-type structure of decision points that describes the image 
Interpretation process 

• Determine from all available analyst aids those to be used at various 
decision points 

• Define a procedure so that labeling errors can be easily diagnosed 

• Test the decision logic and obtain labeling results for further development 
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3. DATA SET 

Eight segments (9- by 11-kilometer area), located in four agrophysical units 
(APU) of the U.S, Corn Belt, were used in developing the technique. Table 3-1 
displays the segment numbers, locations, APU's, available acquisitions, and 
major crops. The data set is selected according to the following criteria: 

a. Presence of the crops of interest (corn and soybeans) 

b. Good acquisition histories 

c. Availability of ground-truth data 

The products available for analyst use include: (1) Landsat film products 

which are false color composites of three bands out of the four bands of the 
satellite's multispectral scanner (MSS), (2) crop calendars, (3) meteoro- 
logical summaries, and (4) spectral aids in the form of plots of transformed 
spectral values from the MSS. 

There are three types of film products: Product 1 is a simulated color- 
infrared (CIR) composite image using Landsat bands 4, 5, and of the Landsat 
MSS (ref. 2); Product 2 is an enhanced image using Landsat bands 5, 6, and 7; 
and Product 3 is a simulated CIR composite image using Landsat bands 4, 5, 
and 7 with different gains and biases set to minimize color distortion. Each 
product is 196 pixels (picture elements) across and 117 lines down and is 
partitioned by a 10-by-10 grid system. 

Two types of crop calendars were used. Normal crop calendars were generated 
for corn and soybeans within designated crop reporting districts (CRD's) in 
the corn belt. The calendars, as shown in figure 3-1, display the percentage 
(Y-axis) of a crop that is at or past a specific growth stage. The time 
(X-axis) is displayed in 15-day intervals throughout the growing season. 

These calendars are based on two or more years of historical data. Current- 
year crop calendars were constructed from actual field observations collected 
on approximately 10 fields per segment at various points throughout the 
growing season. The format of the current-year crop calendar is shown in 
figure 3-2. 
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TABLE 3-1.- THE DEVELOPMENT DATA SET 


Segment! Location | APU ^jjj^dataf' 2 1 Major crops 


209 Gentry, 
Missouri 


211 Grundy, 
Missouri 


Marshall , 
Iowa 


Iroquois, 

Illinois 


June 16 (167) 
July 4 185 
July 31 (212 
Aug 8 (220) 
Aug 9 221 
Sept 4 (247) 
Sept 22 265) 
Sept 23 (266) 
Oct 1 (274) 
Oct 19 (292) 


June 15 (166 
July 3 (184 
July 21 (202 
Aug 8 (220! 
Sept 4 247' 
Sept 22 265 1 
Oct 1 (274' 
Oct 19 (292* 
Oct 28 (301 ! 


June 15 (166) 
Aug 17 (229 
Sept 4 (247) 
Sept 22 265 
Oct 1 (274 
Oct 19 (292) 


June 12 (163) 
Aug 5 (217) 
Aug 23 (235 
Aug 31 243 
Sept 1 (244) 
Sept 9 (252) 
Sept 28 271) 
Nov 2 (306) 
Nov 3 (307) 


Corn 

Soybeans 

Hay 

Pasture 


Corn 

Soybeans 

Sorghum 

Hay 

Pasture 


Corn 

Soybeans 

Oats 

Pasture 


Corn 

Soybeans 

Oats 

Hay 


[•] 















TABLE 3-1.- Concluded. 


Segment Location 


854 Tippecanoe, 
Indiana 


883 Palo Alto, 
Iowa 


Pottawatomie, 14 
Iowa 


ADII Acquisition date 
APU (Julian data) 


June 10 (161) 
July 26 (207) 
Aug 9 (221) 
Aug 21 233 
Aug 22 (234) 
Sept 8 (251) 
Sept 9 (252) 
Sept 26 (269) 
Sept 27 (270) 
Nov 2 (306) 
Dec 17 (351) 


July 5 (186) 
July 23 (204) 
Aug 1 (213 
Aug 10 (222 
Sept 24 (267) 
Oct 20 293 
Oct 30 303 


June 16 (167) 
July 5 (186) 
July 23 (204) 
July 31 (212 
Sept 6 (249) 
Sept 15 (258) 
Sept 24 (267 
Oct 20 (293) 
Nov 7 (311) 


June 16 (167) 
July 23 (204) 
Aug 9 (221 
Sept 23 (266 
Sept 24 (267) 
Oct 20 (293) 


Major crops 


Corn 

Soybeans 

Clover 

Pasture 


Corn 

Soybeans 

Hay 

Pasture 


Corn 

Soybeans 

Oats 

Pasture 


Corn 

Soybeans 

Oats 

Hay 

Pasture 
























NORMAL CROP-CALENDAR PLOTS FOR STATE: IOWA. CROP REPORTING DISTRICT 2 



Figure 3-1.- Normal crop calendar 





Figure 3-2.- Current year crop calendar for segment 883 











The meteorological summaries offer a synopsis of the weather at the state 
level and are available on a weekly basis. 

Spectral aids which include scatter plots, time plots, and trajectory plots 
are generated before interpretation to aid in labeling. The data (209 grid 
intersection pixels called dots) are transformed into Kauth space before the 
aids are generated (ref. 3) and greenness is changed to green number by 
subtracting a calculated soil line (ref. 4). 

The scatter plot in figure 3-3 is a graphic representation of the transformed 
MSS data. The typical green-number-versus-brightness scatter plot is triangu- 
lar in shape. The base of the triangle contains the bare soil pixels. The 
distance of a pixel from the base is a measure of vegetation canopy and the 
distance that a pixel is from the Y-axis is a measure of its brightness. A 
scatter plot is generated for each acquisition in the data base. 

Time plots display green number versus time and brightness versus time, as 
shown in figure 3-4. Two dots (pixels) are plotted per graph for every usable 
acquisition in the data base. Time plots show the changes in green number 
and/or brightness for a particular pixel over an entire growing season. 

A trajectory plot displays a spectral pattern for a pixel over a period of 
time. It uses the same axes information as does a scatter plot, but it con- 
tains data on one pixel for up to eight acquisitions, as shown in figure 3-5. 
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Figure 3-3.- Scatter plot. 
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Figure 3-5. -Trajectory plots. The grid intersection and green number and brightness 
values for the first five acquisitions are printed at the top of each plot. Due 
to the scale of the plot, values may fall in the same position on the plot and 
not be represented with a letter. 
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4* TECHNICAL APPROACH 

The approach to the task (ref. 5) consisted of two phases. In the first 
phase, the then current procedures for labeling small grains (ref. 6) were 
examined for their applicability to the corn/soybeans case. Typically, these 
procedures consist in the examination of various alternative pieces of 
evidence to make a decision relating to land usage. Thus, the first step was 
to make this decision process more objective by eliminating the alternatives. 
Only one of the alternatives was selected for the decision. Then, the process 
was formalized by reformatting it in the form of decision points arranged in a 
tree-like structure. In the second phase, a separate effort was mounted to 
address the decision-making for the decisions that were more specifically 
related to corn and soybeans. These decisions were also formatted in a tree- 
structured approach. 

In order to design the structure of each step of the second phase of the 
study, the different land uses and crop types were observed on each of the 
analyst aids to identify distinctive characteristics and trends. Ground-truth 
information was used when analyzing tne film products and the spectral aids. 
Ground-truth labels were obtained from an annotated aerial photograph with a 
registered grid overlay. The grid overlay corresponds to the film product 
grid. The ground-truth pixels which were used for this study spectrally and 
spatially represent only one category (pure pixel). 

Acquisition-specific information was collected and analyzed for corn and 
soybeans. Appendix A contains an explanation and table of that information. 
These data were then used to define biowindows and image characteristics of 
the corn and soybeans. The spectral aids were examined for patterns which 
would separate corn and soybeans from each other and from other crop types 
(ref. 7). Then each of the analyst aids were evaluated according to their 
suitability for use at specific decision points. Thus, a structure was built 
up using these objective observations to make decisions, each of which would 
be an element of the structure, and each branch or set of decisions would lead 
the analyst to a crop identification and label. 
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Two tests were performed using the corn/soybean decision logic. The first 
experiment was designed to identify problems with the procedure and provide 
for Improvements before further testing. Labeling accuracies and the effects 
of the group (analyst) and region were addressed. The second test was 
designed to perform a wlthin-strata variance study and estimate sampling and 
classification variance. 'This information would then be an input to a simu- 
lated aggregation. This test allowed for the use of the labeling logic in an 
operational -type environment. Only preliminary labeling results have been 
obtained on this second test. 
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5. DESCRIPTION OF THE DECISION LOGIC 


The procedure developed from the analysis of the analyst aids available for 
the eight segments uses Landsat data in both imagery format and spectral aids 
as input. The logic diagram that leads to land usage and crop identification 
consists of four steps: 

Step 1 — identification of cropland 

Step 2— identification of summer cropland 

Step 3 _ identification of definite corn and soybean signatures 

Step 4 — identification of the remaining signatures 

✓ 

5.1 S TEP 1 — IDENTIFICATION OF CROPLAND 

Step 1 consists of the series of decision points arranged in the tree-like 
structure (decision tree) presented in figure 5-1. All workable simulated CIR 
Landsat acquisitions over the segment are used to sort the signatures in the 
scene into land-use categories. A minimum data set of two acquisitions is 
necessary for use of this tree. However, the decision tree is normally used 
in conjunction with the subsequent steps which impose more stringent require- 
ments on the data set. The lowest level crop(s) of interest dictate the 
minimum data set. 

To identify the land use associated with a particular signature, the analyst 
follows a path determined by the decisions given at the decision points 
encountered. The questions asked at each decision point are keyed by number, 
as shown in figure 5-1, and appear in figure 5-2. Each decision point is 
designed to use information extracted from the imagery based on the color of 
the crop in an acquisition in relation to the color in other acquisitions. 

The pathway thus defined allows for the identification of major land-use 
categories. Definitions and characteristics of categories identified in this 
step can be found in appendix B. Since definitions from other sources 
(ref. 8) combine categories that are separable with this procedure or alter- 
natively include features which are too small to be detected on Landsat 
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DECISION CRITERIA FOR MAJOR LAND-USE CATEGORIES 


1. Is the area some shade of red (red, pink, brown, orange, etc.) on at 
least one acquisition? 

2. Does the area appear to be water (dark blue-black to bright blue) on any 
of the acquisitions? 

3. Is the area some shade of red on all acquisitions (i.e., no planting or 
harvest appearance)? 

4. Is the area harvested (blue, green, white, gray, yellow) on an 
acquisition following the one in which It appeared red? 

5. Is the area red or reddish brown throughout the year, with the color most 
intense during the late spring or early summer? (Some trees lose their 
leaves annually and may appear dark brown during the winter.) 

6. Is the area large and irregular? 

7. Is the area large relative to the economic endeavor of the area, along a 
drainage network, and bright red in late spring and early summer and 
reddish brown or brown at other times? 

8. Is the shape of the area similar to areas that have been Identified as 
cropland and the color green or blue (may vary from dark to light during 
the year) on all acquisitions? 

9. Is the area small and white to dull gray? 

10. Is the area irregular in shape and a constant white to mottled steel blue 
throughout the year? 

11. Does the area appear to be constantly bright with no green vegetation and 
no seasonal change in shape or size? 

12. Does the area appear dark blue-black to bright blue on all acquisitions? 
(Size and shape may change during year, but area is not seasonally wet.) 


Figure 5-2.- Decision criteria questions keyed to 
the decision points in figure 5-1. 
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Imagery, definition of the categories as used In the decision tree are neces- 
sary. All major land-use categories are labeled except for cropland which 
will be refined through further analysis. Labels are always associated with 
the dot which represents the area and signature being Identified. 

5.2 STEP 2 - IDENTIFICATION OF SUMMER CROPLAND 

The signatures Identified as cropland In Step 1 are separated into summer and 
nonsummer cropland by following Step 2, In order to perform this step, three 
biowindows are defined using the corn and soybean historical crop calendars, 
the 18-day ground truth observations, and Landsat CIR film products. (The 
ground truth observations are used only for development; ground truth Infor- 
mation is not available during testing.) A biowindow is a time in the growth 
cycle of a crop when predictable Landsat signatures can be identified. Corn 
and soybean biowindows are described in table 5-1, and crop growth stage 
numbers for corn and soybeans are shown in table 5-2. 

Figure 5-3 is a display of the crop calendar annotated with the defined 
biowindows. Figure 5-4 is the flow diagram for separating summer and non- 
summer cropland. Fields that are bare soil (not red on imagery) on at least 
one acquisition in biowindow A, green vegetation (red on imagery) on all 
acquisitions in biowindow B, and ripe and/or harvested (not red on Imagery) on 
all acquisitions in biowindow C are identified as summer crops. The nonsummer 
crop signatures are labeled at this point and the summer crop signatures are 
further processed in Step 3. 

Dots which represent more than one signature either as a boundary between two 
categories or because of misregistration between acquisitions are identified 
and appropriately documented during this step because this is usually the last 
step that requires film products. Mi sregi stered dots may be reserved for 
labeling in Step 4* 


TABLE 5-1.- CORN AND SOYBEAN BIOWINDOWS 


Bio- 

window 

Definition® 

Description of expected 
Characteristics 

Open on 
latest 

Close on 
earliest 

A 

C 30%>1 
S 30%>X 

C 80%>2 
S 10Jt>2 

Plowing, planting, pre- 
emergence, or very early 
emergence for summer crops 

B 

C 50%>3 
S 1Q%>3 

C 30X>5 
S 10%>5 

Full ground cover and green 
vegetation for summer crops 

C 

C 100%>5 
S 100%>5 

C 80X>6 
+30 days 
S 80%>6 
+30 days 

Mature, harvest, and post- 
harvest for summer crops 


a For example, entry C 30%>5 means that, according to the 
normal crpp calendar, corn is 30 percent past stage 5 
(maturity), Oates should be determined for both corn 
and soybeans and the latest used to open windows, the 
earliest to close windows. 


TABLE 5-2.- GROWTH STAGE NUMBERS FOR CORN AND SOYBEANS 


Growth stage 
number 

Corn growth stage 

Soybean growth stage 

0 

Plowing 

Plowing 

1 

Planting 

Planting 

2 

Floral initiation 

Rapid nodal development 

3 

Tassel-silk 

Full pod 

4 

Denting 

Full seed 

5 

Maturity 

Maturity 

6 

Harvest 

Harvest 


5-5 


B-3 


cn 




NORMAL CROP-CALENDAR PLOTS FOR STATE: IOWA, CROP REPORTING DISTRICT 2 



Figure 5-3.- Crop calendar annotated with biowindows. 








Si 


Figure 5-4. - Diagram of decision logic for summer and 
nonsummer cropland separation' (Step 2). 










- 1 

5.3 STEP 3 — IDENTIFICATION OF DEFINITE CORN AND SOYBEAN SIGNATURES 

The logic flow of this step is diagrammed in figure 5-5. A minimum data set 
is required for identifying corn and soybeans. Two acquisitions are 
necessary, one acquisition in either biowindow A or biowindow C and one 
acquisition in a subset of biowindow B, called a separation biowindow, and 
defined as shown in the following table. 


Definition 

Description of expected characteristics 

Open on 
latest 

Close on 
earliest 

C 90%>3 
S 50$>3 

C 3QP5 
S 1056>5 

Most of the corn is in the denting stage, 
and most of the soybeans are in the full 
pod stage. 


A green-number-versus-brightness scatter plot of 209 unlabeled dots selected 
by systematic random sampling from within the scene is generated for each 
acquisition in the separation biowindow. An analyst team (3 to 5 analysts) 
determines which acquisition has the best separation or natural break in the 
data. Lines are drawn through the break in the data that best separates the 
two groupings. One of the groupings will be associated with corn and the 
other with soybeans. The lines are constrained to be parallel to the x and y 
axes. Then, five counts are added and subtracted from the lines, as shown in 
figure 5-6. The shaded are« ... counts for areas of over-lapping categories. 
All summer crop dots that fall outside the limits in quadrant 1 are labeled 
soybeans, and all summer crop dots that fall outside the limits in quadrant 3 
are labeled corn. Table 5-3, which shews the green number and brightness 
table generated with the scatter plot, is used to expedite this process. All 
dots within the limiters (shaded area) are reserved for labeling in Step 4 
along with misregistered dots. 

5.4 STEP 4 — IDENTIFICATION OF THE REMAINING SIGNATURES 

Two methods of analyzing the remaining dots are represented in the flow 
diagram (figure 5-7) depending on the type of dot being labeled. If the dot 
is misregistered (edge dot), then the area the dot is in on the base 
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Figure 5-6.- Delineation of break in data and limiters 
on scatter plot for Step 3. 


















acquisition is compared with areas of known corn and soybeans and labeled 
according to the area It most closely resembles. Green number and brightness 
are plotted versus time for all acceptable (cloud- and haze-free) acquisitions 
to aid the analyst in labeling the dots that fell within the limiters. These 
time profiles are obtained for all previously labeled and unlabeled samples. 

In Step 4, the analyst compares corn and soybean profiles labeled from Step 3 
with the profiles of the yet uhlabeled dots. The unlabeled profiles are then 
labeled by assigning them the label of the most similar profile. 


FAGS ElhUli NOT FILMED 


6. SUMMARY OF TESTS AND RESULTS 

The two tests conducted using the corn/soybean decision logic procedure were 
the Multicrop Exploratory Experiment (ref. 1) and the Simulated Aggregation 
Test. In the first test, the objectives were to shake down the procedure and 
to determine if the procedure is analyst dependent. The objectives of the 
second test were to test the procedure that resulted after modifications based 
on the first test were included and to provide information such as segment 
number, location, acquisitions used, defined biowindows, an- the separation 
point for the data sets used is presented in appendix C. 

For the multicrop test, a rigid design plan was followed using three groups of 
analysts and preselected segments and acquisitions. Each segment was worked 
by at least two groups. In the simulated aggregation test, three analyst 
teams (group I, group II, and group III) were responsible for doing the entire 
labeling procedure including segment and acquisition selection. Of the 100 
segments designated for the test, 88 met the labeling criteria. Each segment 
was labeled only once. Included in the second test were 23 segments from the 
first test which were relabeled by a new analyst team. 

Overall labeling accuracies comparing analyst labels to pure small -dot ground- 
truth labels (ref. 9) for each test are presented in table 6-1. The better 
accuracies in the second test are attributed to improvements made to the pro- 
cedure based on results from the first test. Also, the analyst labeled 
approximately 60 spectrally pure dots as opposed to approximately 140 spec- 
trally mixed or pure dots for which labeling was required in the first test. 

Although no significant difference was found, a comparison of the labeling 
accuracies in table 6-2 shows that the proportion of correct labels at the 
segment level was generally better in the second te.st. 

During the second test, only acquisitions within a biowindow were used, and 
two to four acquisitions were acceptable. Preselected acquisitions used in 
the Multi crop Exploratory Experiment provided less than optimum data for some 
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TABLE 6-1.- LABELING ACCURACY FOR ANALYST LABELS COMPARED 
TO PURE SMALL-DOT GROUND-TRUTH LABELS 9 
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TABLE 6-2.- LABELING ACCURACY FOR TWENTY-THREE SEGMENTS 
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Mfsregistered data affected labeling accuracy < 




segments because they had to be chosen before biuwindow definition guidelines 
had been completed and before retro-ordered acquisitions were available. For 
example, four acquis Itiom. were required for processing. Therefore, the 
fourth acquisition usually occurred outside a window, causing confusion 
because of mixed signatures. In some cases* acquisitions outside a biowindow 
were used when an equally good or better acquisition was available In the 
biowindow. This improvement to the test design may explain in part the better 
accuracies observed in the second test. 

Other trends were observed during test evaluation. One observation from the 
first test was that, from the first to the second time a segment was labeled, 
accuracies increased 74 percent of the time for corn and 56 percent of the 
time for soybeans. This indicates that, as the analyst becomes more familiar 
with procedures, labeling accuracy may improve. 

The labeling accuracy of group III for corn was lynificantly different when 
compared to the accuracy obtained by other groups (ref. 1). For some seg- 
ments, group III picked a different separation date or differed the placement 
of the separation point on the scatter plot. In those cases, the inconsisten- 
cies had a definite effect on the correct identification of corn and soybeans. 
The overall labeling accuracies were affected negatively by this group effect. 

Some problems with the procedure were identified in the procedure control 
reports (refs. 10 and 11) as follows: 

• Although biowindow definitions were considered to be straight forward, 
biowindow ranges determined by two different teams sometimes varied as much 
as 20 days. The primary reason for the discrepancies was related to the 
use of the crop calendar shown in figure 6-1. This presentation of crop 
calendar information, depicting 10-day intervals, was not conducive to 
defining biowindow ranges consistently. Differences in biowindow length 
could seriously affect the acquisition selection. 
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• The spatial and color determinations which were made from thv Imagery 
introduce subjective judgments into the procedure. Identification of mixed 
and misregistered pixels was a difficult task to accomplish. Inconsistency 
was observed at two different times: by the same individual at different 
times and between individuals. Color determinations also differed from 
analyst to analyst. 

• Currently, the decision logic only identifies the normal corn/*tt;> 3 &n 
growth cycles. Deviations caused by double cropping, episodal events, and 
late and early planting were not accounted for in the decision logic. 

In summary, the corn/soybean decision logic procedure was easily learned and 
Implemented by both experienced and inexperienced analysts. The amount of 
time necessary to do the procedure compared favorably with other procedures. 
Quality assurance (f-rocedures Control) and error characterization functions 
were objective because the decision logic was systematic enough that 
diagnostics could be readily applied to identify the steps where labeling 
problems occurred. Steps which required changes and/or modifications were 
recognized readily, In addition, several parts of the decision logic, 
particularly Steps 2, 3, and 4, could be automated. 
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7. RECOMMENDATIONS 


In order to refine the current decision logic, various actions should be 
undertaken; 

• Normal (historical) crop calendars, which often contained interpolated data 
and represented only two to five years of information, should be expanded 
to increase reliability and should have a standard format to allow for 
consistent definition of biowindow ranges. Current year crop calendar 
information and adjustable growth models would aid in future development 
and more accurate biowindow definitions. 

• Further study is needed to determine if incorporation of spectral aids into 
Step 1 and Step 2 could alleviate some of the current inconsistencies in 
those steps. 

• Proceed to automate various parts of the decision logic. Some of the sub- 
jective decisions that an analyst is forced to make could be alleviated by 
using a boundary detection algorithm (i.e., BLOB, ref. 12) and a curve com- 
parison routine (i.e., Badhwar, ref. 13). Both the biowindow definitions 
and the scatter plot break are conducive to automation. If a color deter- 
mination scheme (i.e., Cate's color model, ref. 14) were incorporated into 
the procedure, then Steps 2, 3, and 4 could be completely computerized. 

The corn/soybean decision logic has produced encouraging results in the 
U.S. Corn Belt. Further study should be done to determine if this procedure 
can be extended to other geographic locations. Also investigations should be 
done to determine if this method of crop labeling can be expanded to other 
crops. 
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OBSERVED CHARACTERISTICS OF CORN AND SOYBEANS 


APPENDIX A 

OBSERVED CHARACTERISTICS OF CORN AND SOYBEANS 


The characteristics of corn and soybeans v/hi ch were observed on the develop- 
ment segments are presented in tables A-l through A-4. 

For both crops, the growth stages corresponding to each acquisition are pre- 
sented in terms of historical data and current-year observations. The histor- 
ical growth stages are taken from CRD normal crop calendars. The observed 
growth stages are taken from segment crop calendars that were constructed from 
actual field observations collected for approximately 10 fields per segment at 
varioi" times throughout the growing season. 

In tables A-l through A-4, image appearance refers to colors observed on the 
Product 1. The green number and brightness for corn and soybeans are 
presented in terms of the means and standard deviations of pure pixels. 
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TABLE A-la.- OBSERVED CHARACTERISTICS OF CORN AND SOYBEANS 
AS A FUNCTION OF GROWTH STAGE, APU 14 






TABLE A-2a OBSERVED CHARACTERISTICS OF CORK AftD SOYBEANS 
AS A FUNCTION OF GROJfTH STAGE, APU 24 



Brightness 43±5.4 77±4.3 69+4.3 4? £ 3.4 46+7.5 41+3 



TABLE A-2B OBSERVED CHARACTERISTICS OF CORK AND SOYBEANS 
AS A FUNCTION OF GROWTH STAGE, APU 24 
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TABLE A-3a.~ OBSERVED CHARACTERISTICS OF CORN AND SOYBEANS 
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APPENDIX B 

DEFINITIONS AND CHARACTERISTICS OF DECISION-TREE CATEGORIES 
B.l RANGE 

Range is uncultivated land that produces forage suitable for livestock 
grazing. Generally, it is land that is not suited for other types of agricul- 
ture, and the natural vegetation consists of predominantly grasslike plants, 
forbs, or shrubs. Most range in the United States is west of a north-south 
line that cuts through North and South Dakota, Nebraska, Kansas, Oklahoma, and 
Texas. 

Characteristics: 

1. Large and irregular in the Western United States 

2. Vegetation indication varied, both within a specific area and between 
different areas; permanent, with some seasonal change 

3. No planting or harvest 

4. Coarse textura 

5. Red-brown to red in summer and a shade of gray in winter 

6. Can occur in conjunction with and adjacent to cropland 

7. Best detected in spring 


<? 
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B.2 PASTURE 

A pasture is a fenced or unfenced tract of land on which farm animals feed by 
grazing. Generally, it is a grass area, but it may also have brush and trees. 
This land category Includes land used for feeding at a specific time in rota- 
tion with other uses; therefore, land in this situation could be pasture one 
year and cropland the next. It must be emphasized that the distinction 
between pasture and range is one of degree and location rather than of actual 
difference in use. Some definitions of pasture list range as a synonymous 
term. 

Characteristics: 

1. Shape varied; geometrical in Eastern and Central United States 

2. S-<ze small in Eastern United States, becoming larger westward 

3. Easily confused with range 

4. Color varied and mixed, ranging from mottled light pink or gray-brown to 
bright red on highly improved pastures 

5. Seasonal changes; no planting or harvest unless new pasture being 
initiated or old one destroyed 

6. Best detected in spring 


1 

B .3 ORCHARDS 

An area or enclosure devoted to growing fruit, nuts, or certain forest pro- 
ducts either as a commercial crop or for reseeding is categorized as an 
orchard. Isolated small enclosures used for these purposes on small farms 
would not be recognizable on Landsat Imagery. J ^ 

Characteristics: 

1. Varied appearance, depending upon such variables as type of trees, 
spacing, age, canopy, time of year, and farming practices 

2. May closely resemble forest — bright red in late spring and early summer, 
red-brown at other times 

3. Size small in relation to forests s 

4. Shape and pattern generally regular i 

5. Area extent usually constant over long time periods 
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B.4 FOREST 


A forest Is a plant association predominantly of trees and other woody vegeta- 
tion that occupies a rather extensive area. 

Characteristics; 

1. Shape, pattern, and size irregular 

2. Generally follows terrain and drainage 

3. No planting or harvest as with crops, but annual loss of leafage by 
certain trees 

4. Area extent usually constant over long time periods 

5. Bright red in late spring and early summer and reddish brown at other 
times; variation in intensity and shade 


8.5 URBAN 


This category is composed of areas that have much of the land covered by 
structures. It Includes villages, towns, cities, strip developments, trans- 
portation and industrial areas, shopping centers, parks, cemeteries, golf 
courses, and sewage plants, as well as institutions that may, in some 
instances, be isolated from the main urban area. It also includes those areas 
that strictly are not urban but have been surrounded by urban development. 

Characteristics: 

1. Irregular in shape and area extent 

2. Grid pattern within urban boundaries 

3. White to a mixed mottled steel blue; constant through time 

4. Texture usually extremely fine 

5. Possible occurrence of irregularly shaped areas of light pink to medium 
red within urban area 

6. Close correlation of pattern with urban outline on map 

7. Transportation network associated with urban area basically white; can be 
constant through time 
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B.6 BARREN LAND 

Barren land has a limited ability to support life. Generally, this is an area 
of thin soil, sand, or rock. Vegetation, if present, is more widely spaced 
and scrubby than that in the range category. Within this category are dry 
salt flats, sandy areas other than beaches, exposed rock, and extractive 
activities {e.g., strip mines, borrow pits, and gravel pits — either active or 
inactive) having significant surface expression (area). 

Characteristics; 

1. Bright and constant throughout year 

2. Varied dark and light colors and tones 

3. Irregular shape 

4. Little or no vegetation 

5. Size varied, ranging from minute (1 pixel) to extreme (1000 pixels or more) 

6. No seasonal change in shape and size 


B.7 OTHER AGRICULTURAL LANO 

This category is for those items not classified under separate agricultural 
categories. It includes farmsteads, farm lanes and roads, ditches, horse 
farms, confined feeding operations such as beef cattle and swine feedlots, 
dairy operations, and large poultry farms. Generally, these items are small 
in area, and it is doubtful that items of this nature can be interpreted on 
Landsat imagery as being other than a farm or farmstead. 

Characteristics: 

1. Color extremely varied and mixed, white to a dirty or off white for 
farmsteads and related activities 

2. Area extent small 

3. No green vegetation 

4. No planting or harvest 

5. Can occur in conjunction with and adjacent to cropland 


B .8 WATER 

This category refers to those areas persistently water covered. It includes 
rivers, streams, canals, lakes (natural and manmade), reservoirs, and bays and 
estuaries that extend inland. 

Characteristics: 

1. Irregular in shape except in some cases where manmade 

2. May change slightly in shape and size during year 

3. Should closely resemble shape and size on map, if mapped 

4. Color varied, ranging from a dark blue-black to a bright blue, but usually 
some shade of blue throughout year 

5. Smooth and uniform texture 

6. No vegetation 
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B.9 CROPLAND 

Cropland includes all land tilled for crops, as well as cultivated wetlands 
such as the flooded fields associated with rice production and developed 
cranberry bogs. 

Characteristics: 

1. Distinctive geometric field and road pattern in Central and Western United 
States; irregular and unsystematic in Eastern United States 

2. Definite seasonal and ntraseasonal changes in color, generally some shade 
of red or red-brown during growing season 

3. Variation in color and intensity with crop type 

4. Planting and harvest 

5. Vegetation present but not permanent 

6. Best detected in summer and early fall 
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B .10 FALLOW 

This Is cultivated land that may be kept free of vegetation by such methods as 
plowing and disking In order to destroy weeds or to conserve a supply of 
moisture for a succeeding crop. 

Characteristics: 

1. Shape and pattern similar to areas Identified as cropland 

2. Planting or harvest 

3. Constant blue-green In color, but may vary from dark to light during year 
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B .11 WETLANDS 

Areas where the water table Is at, near, or above the land surface for ' 
significant part of most years are categorized as wetlands. This category 
Includes marshes, swamps, and tidal flats along the shallow margins of bays, 
lakes, rivers, and manmade impoundments or reservoirs, bogs, wet meadows, 
seasonally wet or flooded basins, playas, potholes, and wetland used for wild- 
life purposes. It does noi. Include wetlands drained for any purpose or wet- 
lands used for rice or similar types of production; these belong to other 
categories. Wetlands can be either forested or unforested. 

Characteristics ; 

1. Highly varied appearance, both In color and intensity, depending upon such 
variables as vegetation type, wet or dry season, and winter or summer 

2. Irregular in size and shape; not similar to areas identified as cropland 

3. Intermittent water possible during year 

4. No planting or harvest 

5. Seasonally wet 
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APPENDIX C 

)ATA SETS USED IN TESTING 

The following tables contain the segment numbers, the state, and the APU In 
which the segment Is located, the separation acquisition, the acquisitions 
used for batch processing, the biowindow ranges, the number of available 
acquisitions in each biowindow and the green number-brightness break In the 
data on the separation acquisition for all of the segments processed. 

Table C-l shows the data set for the Multi crop Exploratory Experiment. 

Table C-2 shows the data set used In the Simulated Aggregation Test. 
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*The base date and acquisitions 2, 3, and 4 are the sa»e for each processing 
b A misregistered date (8292) caused inaccurate labeling of this segment. 
c Other acquisitions were available within a biowindow range* 






















































TABLE C-1-- Continued. 
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1. INTRODUCTION 


The Simulated Aggregation Test (SAT): U.S. Corn and Soybean Exploratory 

Experiment was executed (1) to determine the labeling accuracy obtainable with 
the current corn and soybean, labeling procedure and to determine the crop 
proportion-estimation errors of the resulting proportion estimates; (2) to 
compare the corn and soybean labeling procedure utilized in the SAT with that 
utilized in the Classification Procedures Verification Test (°VT) via a 
comparison of the labeling accuracy and the proportion-estimation errors of 
the two procedures; and (3) to test the aggregation logic for obtaining cron 
area and production estimates at state and regional levels. This report 
presents the results of (1) and (2). 

The design of the SAT called for three analyst-interpreter (AI) groups (two 
from NASA and one from Lockheed) to label 50 to 70 Type I dots on each of 88 
segments located in 5 agro-physical units (APU's) in 6 states of the U.S. 

Corn Belt, Each segment was to be labeled once only using a modified ver- 
sion of the corn and soybean labeling procedure utilized in the PVT (refs. 1 
and 2). 

Of the 88 segments labeled, 23 were a subset of the 29 blind sites processed 
in the PVT; 35 were additional blind sites; and the remaining 30 were nonblind 
sites. All the 23 segments in the SAT that were also processed in the PVT 
(hereafter referred to as Group 1 segments) had digitized ground truth 
available. Of the additional 35 blind sites (hereafter referred to as Group 2 
segments), 18 had digitized ground truth available, and the remaining 17 had 
400-dot ground truth available. 

Since the NASA groups had already seen the ground truth for the Group 1 seg- 
ments, it was stipulated that these 23 segments would be processed by the 
Lockheed group. Otherwise, there were no constraints on the assignment of 
segments to the AI groups. Table 1-1 shows the assignment of the blind sites 
to the APU's and AI groups. 
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Group 
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14 

24 

25 

28 
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888 

142 890 

137 

120 355 


895 

866 1872 

145 

325 


397 

371 


827 



375 


340 





348 





851 

3 

887 

134 894 

138 

826 


896 

183 


836 



362 


347 



867 


349 



874 


855 

C 

864 

135 

107 809 

123 842 


865 

184 

141 

127 843 


877 

870 

144 

133 852 


880 

882 

205 

828 853 


881 


800 

332 860 





837 











2. ANALYSIS OF THE SIMULATED AGGREGATION TEST 


Analyses were made to investigate the crop proportion-estimation accuracy and 
dot- labeling accuracy in the SAT as well as to compare the crop prooorti on- 
estimation accuracy and dot-labeling accuracy of the SAT with that of the R VT. 


2.1 CROP PROPORTION-ESTIMATION ACCURACY IN THE SIMULATED AGGREGATION TEST 


Initially, a linear model of the form 


P. .. 
uk 


ijk 


u + A. * Gj + (AG) i3 . + e ( i j ) k 


was assayed where 

P.jk - the proportion estimate of the crop of interest for the k™ segment 
of the i t ^ 1 APU, labeled by the j th group 

= the corresponding ground truth proportion 

u = the overall mean difference 

A-j = the effect of the i 1 -* 1 APU (fixed) 

G-j = the effect of the group (random) 

(AG)ij = the interaction of the i t * 1 APU and the group (mixed) 

e , . . , . = the random error resulting from the k t ' 1 segment of the i^ 

\ ' J / ^ x. O 

APU, labeled by the j 111 group, assumed NID(0,o ). 


However, for the crops of interest (corn and soybeans), the model accounted 
for less than 29 percent of the observed variation. (Table 2-1 gives the 
coefficient of determination, R^, for each crop.) Hence, the analyses were 
performed without regard to APU or group effects. 


Plots of ground truth proportions (abscissa) versus crop proportion-estimation 
error (ordinate) are displayed in figures 2-l(a) for corn and 2- 1 ( b ) for soy- 
beans. Overestimation of corn and underestimation of soybeans are clearly 
evident, a pattern that also emerged in the PVT (ref. 3). 
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TABLE 2-1.- COEFFICIENT OF DETERMINATION 
FOR EACH CROP OF INTEREST 


Crop 

Coefficient of determination, 

percent 

Corn 

28.4 

Soybeans 

25.4 


Table 2-2 presents the mean error, the standard deviation of the error, the 
mean square error, and the 95 percent confidence intervals of the mean error 
for the corn and soybean proportion estimates. Since neither confidence 
interval contains zero, the mean proportion-estimation error for both corn and 
soybeans is significantly different from zero (a = 0.05), with corn over- 
estimated an average of 4.58 percent per segment and soybeans underestimated 
an average of 7.81 percent per segment. 

Table 2-3 ii-Jicates that the overestimation of corn is due largely to an over- 
estimation in the Group 2 segments, whereas for soybeans, the mean errors for 
the Group 1 and Group 2 segments are essentially equal. 

2.2 COMPARISON OF THE CROP PROPORTION-ESTIMATION ACCURACY OF THE SIMULATED 

aggregation test with The 'Classification Procedures verification test 

The comparison of the SAT with the PVT was made in two parts: 

1. A paired comparison of the Group 1 segment proportion-estimation accuracy 
with the PVT proportion-estimation accuracy. 

2. A comparison of the Group 2 segment proportion-estimation accuracy with 
the PVT proportic ’estimation accuracy. 

2.2.1 PAIRED COMPARISON OF THE GROUP 1 SEGMENTS WITH THE CLASSIFICATION 
PROCEDURES VERIFICATION TEST 

Since the segments of the PVT were labeled by at least two AI groups whereas 
the Group 1 segments were labeled only once, it was necessary to compare the 
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TABLE 2-2.- CROP PROPORTION-ESTIMATION ACCURACY FOR THE SAT 


Crop 

Mean 

ground truth 
proportion, 
percent 

Mean 

error, 

percent 

Standard 
deviation of 
mean error, 
percent 

Mean 

squa 

errc 

95 percent 
confidence 
intervals of 
mean error 

Corn 

40.58 

4.68 

6.95 

68.38 

[2.80, 6.36] 

Soybeans 

29.67 

-7.81 

5.57 

91.54 

[-9.24, -6.38] 


TABLE 2-3.- CROP PROPORTION-ESTIMATION ACCURACY IF THE PVT AMO T '€ 



Corn 

Soybeans 

Test 

Mean 

error, 

percent 

Standard 

deviation, 

percent 

Mean 

square 

error 

Mean 

error, 

percent 

Standard 

deviation, 

percent 

Mian 

square 

error 

PVT 

2.43 

10.00 

103.8 

-4.67 

6.33 

61.0 

SAT 

4,58 

6.95 

68.4 

-7.81 

5.57 

91.5 

SAT 

Group 1, 
a 23 

1.88 

6.52 

44.1 

-8.10 

4.71 

86.8 

SAT 

Group 2, 
b 35 

6.35 

6.73 

84.3 

-7.62 

6.13 

94.7 


a Number of blind site segments in the SAT that were also processed in the PVT; 
referred to in text as Group 1 segments. 

^Number of additional blind sites in SAT; referred to in text as Group 2 
segments. 


2-4 


C-18 






I 


\ 

) 

> 

) 

I 

I 

I 

f 

i 

r 


i 


absolute value of the proportion-estimation error (absolute error) of each 
Group 1 segment with the mean absolute error of the corresponding PVT segment 
by means of the difference: mean absolute error minus absolute error# 

The hypothesis of a mean difference of zero versus all alternatives was then 
tested (a o 0.05). The results, displayed in table 2-4, show no significant 
difference in the proportion-estimation accuracy of corn; however, soybeans 
were underestimated to a significantly greater degree in the Group 1 segments 
(a mean difference of -2,60 percent). 

2.2.2 COMPARISON OF THE GROUP 2 SEGMENTS WITH THE CLASSIFICATION PROCEDURES 
VERIFICATION TEST 

The analysis for the comparison of the Group 2 proportion-estimation accuracy 
with the PVT proportion-estimation accuracy consisted of testing the hypoth- 
esis that the mean error of the PVT segments minus the mean error of the 
Group 2 segments was significantly different from zero (a = 0.05) versus all 
alternatives. Table 2-5 displays the results of this test. Corn was over- 
estimated to a significantly greater degree and soybeans underestimated to a 
significantly greater degree in the Group 2 segments. 


2.3 LABELING ACCURACY OF THE SIMULATED AGGREGATION TEST 

Tables 2-6(a) through 2-6(c) display, for all blind sites for the Group 1 
segments and all blind sites for the Group 2 segments, the percentage of a 
given crop category labeled "corn," "soybeans," and "other" (neither corn nor 
soybeans). With errors of omission being essentially equal for corn and soy- 
beans, the confusion errors for Group 1 and Group 2 together [table 2-6 ( a ) ] 
indicate that the AI groups could recognize corn signatures more readily than 
soybean signatures. This failure to discriminate soybeans from corn is due to 
late planting of soybeans, making the signatures of these late planted soy- 
beans spectrally inseparable from corn. As a result, corn is overestimated 
and soybeans underestimated. 
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TABLE 2-4.- PAIRED COMPARISON OF THE CROP PROPORTION-ESTIMATION 
ACCURACY OF THE GROUP 1 SAT SEGMENTS WITH THE PVT SEGMENTS 


Crop 

Mean 

difference 
(PVT and Group 1 
SAT), percent 

Standard 

deviation, 

percent 

Standard 
error of 
the mean, 
percent 

95 percent 
confidence 
intervals 

Corn 

2.01 

5.69 

1.19 

[-0.32, 4,34] 

Soybeans 

-2,60 

4.53 

0,94 

[-4.44, -0.7A] 


TABLE 2-5.- COMPARISON OF THE PROPORTION-ESTIMATION ACCURACY OF THE 
PVT SEGMENTS WITH THE GROUP 2 SAT SEGMENTS 


percent 


Corn 

Soybeans 


Group 2 SAT 


percent 


Difference of 
mean errors, 
percent 

Standard 
error of 
difference, 
percent 

95 percent 
confidence 
interval s 

-3.92 

1.94 

[-7.72, -0.12] 

2.95 

i 

1.38 

[0.25, 5.65] 
























TABLE 2-6.- DISTRIBUTION OF LABELS WITHIN EACH 
GROUND TRUTH CATEGORY 

(a) All SAT blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

percent 

Corn, 

percent 

Soybeans, 

percent 

Other, 

percent 

Corn 

92.58 

1.62 

5.80 

43,36 

Soybeans 

6.87 

87.58 

5.54 

30.25 

Other 

2.92 

1 « 14 

95.93 

26.39 


(b) 

Group 1 blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

percent 

Corn, 

percent 

Soybeans , 
percent 

Other, 

percent. 

Corn 

88.25 

1.77 

9.98 

44.00 

Soybeans 

7.97 

33.33 

3.70 

26.93 

Other 

3.69 

2.35 

93.96 

29.07 


(c) 

Group 2 blind sites 


Ground 

truth 

Label 

Ground 

truth 

proportion, 

p'-cent 

Corn, 

percent 

Soybeans, 

percent 

Other, 

percent 

Corn 

94.89 

1.54 

3.56 

43.03 

Soybeans 

6.39 

89.46 

4.15 

31.99 

Other 

2.45 

0.41 

97.14 

24.99 
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The drop in labeling accuracy from the Group 2 segments to the Group 1 seg- 
ments [tables 2-6(b) and 2-6(c)] is accompanied by a small increase in 
confusion errors (6.39 to 7.97 percent for soybeans and 1,54 to 1.77 percent 
for corn), and a rather large increase in errors of omission (4.15 to 8.70 
percent for soybeans and 3.56 to 9.98 percent for corn). In other words, the 
discrimination between corn and soybeans of the Group 1 segments was at 
approximately the same level as that of the Group 2 segments. However, the 
separation of corn and soybeans from "other" was not done as well on the Group 
1 segments as on the Group 2 segments. 

The discrepancy in labeling accuracy between Group 1 and Group 2 segments is 
difficult to explain. Those AI groups labeling the Group 2 segments had 
previously used, in the PVT, a corn and soybean labeling procedure similar to 
the one used for the SAT. On the other hand, the AI group labeling the 
Group 1 segments had never used a corn and soybean labeling procedure. This 
observation seems to indicate that labeling accuracy is a function of famili- 
arity with the labeling procedure. However, any effect induced by familiarity 
with the labeling procedure would be totally confounded with any effect 
induced by the segments. 

Relating the labeling accuracy of the Group 1 and the Group 2 segments to 
their respective proportion-estimation accuracies (table 2-3) shows that even 
though the labeling accuracy of corn and soybeans is higher for the Group 2 
segments, the proportion-estimation accuracy of corn in the Group 2 segments 
is much worse than that of the Group I segments. Also, the proportion- 
estimation accuracy of soybeans is only slightly better. 

This discrepancy in labeling is a result of the reduction in omission errors 
for the Group 2 segments and the spectral inseparability of some soybeans from 
corn due to late planting of soybeans. This inseparability of soybeans from 
corn results in an underestimation of soybeans and an overestimation of corn 
for both groups of segments. The decrease in omission errors for corn in the 
Group 2 segments, however, further inflates the estimate of corn. The 
decrease in omission errors for soybeans appears to have little influence on 
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reducing the underestimation of soybeans, indicating that committing soybeans 
with corn has a greater impact on soybean proportion-estimation accuracy than 
the mislabeling of soybeans as "other." 

2.4 COMPARISON OF THE DOT-LABELING ACCURACY OF THE SIMULATED AGGREGATION TEST 
AND THE CLASSIFICATION PROCEDURES VERIFICATION TEST 

Dot-labeling accuracy for the PVT, the Group 1 segments, the Group 2 segments, 
and the Group 1 and Group 2 segments combined is displayed in table 2-7. 
Overall, the labeling accuracy of the SAT improved over that of the PVT, with 
the labeling accuracy of the Group 2 segments contributing the most to this 
improvement. However, since dot-labeling accuracy data at the segment level 
was available only for the Group 1 segments, it was not possible to determine 
if the improvement in labeling accuracy for the Group 2 segments was 
significant. 

The label ig accuracy of each Group 1 segment was compared with the mean 
labeling accuracy of the corresponding PVT segment by subtracting the Group 1 
figures from the corresponding PVT figures. The null hypothesis of a mean 
difference of zero was tested against all alternatives (a = 0.05). The 
results are given in table 2-8. 

Since each of the 95 percent confidence intervals contains zero, the null 
hypothesis that the mean difference in labeling accuracy between the PVT seg- 
ments and the SAT Group 1 segments is zero could not be rejected. 

2.5 ANALYST-INTERPRETER LABELED, TYPE I DOT PROPORTION ESTIMATES 

Crop proportion estimates of corn and soybeans were made for each blind site 
by using the proportion of dots labeled corn and the proportion of dots 
labeled soybeans. Figures 2-2(a) for torn and 2-2(b) for soybeans display 
plots of ground truth proportions versus the dot proportion-estimation error. 

In table 2-9, the mean errors of the machine-classified estimates and the dot 
estimates are displayed. For both corn and soybeans, the Type 1 dots, as a 
random sample, produced, smaller estimation errors, with the dot-estimation 
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TABLE 2-7.- DOT-LABELING ACCURACY 
FOR THE PVT AND THE SAT 


Test 

CroD 

Corn, 

percent 

Soybtv.r' 1 

percent 

Other, 

percent 

PVT 

86 

79 

93 

SAT 

Group 1 

88 

83 

94 

SAT 

Group 2 

95 

89 

97 

SAT 

93 

88 

96 


TABLE 2-8.- COMPARISON OF THE PVT AND THE SAT GROUP 1 

LABELING ACCURACY 


Crop 

Mean 

difference 
(PVT and Group 1 
SAT), percent 

Standard 

deviation, 

percent 

Standard 
error of 
the mean, 
percent 

95 percent 
confidence 
interval s 

Corn 

- 3.47 

11.05 

2.36 

1 — 1 
1 

CO 

• 

»— * 

0 

t— * 

• 

CTl 

1 1 

Soybeans 

-2.95 

20.14 

4.29 

[-11.36, 5.46] 

Other 

-1.73 

11.11 

i 

2.37 

[-6.38, 2.92] 
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TABLE 2-9.- CLASSIFICATION ERRORS OF THE SAT 



OF POOR QUALilH 



\ 



\ 

i 
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error for corn not significantly different from zero, although the estimate of 
soybeans is biased. However, the mean square errors for the two types of 
classification are not appreciably different, indicating that if the dot esti- 
mates are not better than the machine-classified estimates, then certainly 
they are no worse. 

To compare the types of classification, two procedures were used. The first 
procedure, utilizing the binomial test, was to investigate whether or not one 
type of classification tended to yield superior estimation accuracy over the 
other. The first step in this procedure was determining the proportion of 
segments for which the dot estimates produced smeller, absolute deviations 
from ground truth. (See "Improved," table 2-10.) Then the null hypothesis 
that this proportion was not significantly different from 50 percent 
(a = 0.05) was tested. For both corn and soybeans, the null hypothesis was 
not rejected. In other words, machine classification is no more likely to 
yield accurate estimates than a random sample of Type 1 dots. 

To further qualify the comparison, the mean improvement of machine-classified 
estimates over dot estimates (see table 2-10) was obtained by finding the 
mean, on a segment-by-segment basis, of the absolute deviation from ground 
truth of the machine-classified estimate minus the absolute deviation from 
ground truth of the dot estimate. The null hypothesis of no significant 
improvement (a = 0.05) was tested. The null hypothesis could not be rejected. 

Thus, machine classification does not improve upon a random sample of Type 1, 
analyst-labeled dots whether measured as a reduction of mean square error, a 
likelihood of yielding more accurate estimates, or a mean difference in 
estimation accuracy. 
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TABLE 2-10.- PROPORTION-ESTIMATION ACCURACY IMPROVEMENT 
USING ANALYST-LABELED, TYPE 1 DOTS 
AS A RANDOM SAMPLE 


Corn 

Soybeans 

Improved, 

Mean 

Improved, 

Mean 

percent 

improvement, 

percent 

improvement, 


percent 


percent 


-1.20 


0.59 

45 

a [-3.00, 0.6] 

52 

a [-0.57, 1.75] 


a Ninety-five percent confidence interval for the mean 
improvement. 
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3. SUMMARY OF RESULTS 


The following result? emerged from the evaluation of the SAT: 

1. Corn was significantly overestimated on an average of 4.58 percent per 
segment (standard deviation, 6.95 percent), and soybeans were signifi- 
cantly underestimated on an average of 7.81 percent per segment [standard 
deviation, 5.57 percent (table 2-2)]. 

2. When comparing the proportion-estimation accuracy of the Group 1 SAT seg- 
ments with the PVT segments, no significant difference emerged for corn; 
however, soybeans were underestimated to a significantly greater degree in 
the SAT segments (table 2-4). 

3. When comparing the proportion-estimation accuracy of the Group 2 SAT seg- 
ments with the PVT segments, corn was overestimated to a significantly 
greater degree and soybeans underestimated to a significantly greater 
degree in the SAT segments (table 2-5). 

4. The labeling accuracy of the Group 2 segments was higher than that of the 
Group 1 segments as a result of fewer corn and soybean dots being mis- 
labeled as "other" in the Group 2 segments [tables 2-6(b) and 2-6(c)]. 

5. In the SAT, more soybeans were labeled corn than corn, soybeans. This was 
caused by the spectral inseparability of late planted soybeans from corn 
[tables 2-6(a) through 2-6 ( c ) ] . 

6. 'Ihe spectral inseparabil ity of late planted soybeans from corn resulted in 
the overestimation of corn and underestimation of soybeans. 

7. Since fewer corn and soybean dots were mislabeled "other" in the Group 2 
segments (as compared with the Group 1 segments), the estimation of corn 
was further inflated, although the reduction in mislabeling had little 
effect on the soybean proportion estimates [tables 2-6(b) and 2-6 ( c ) ] . 

8. Overall, labeling accuracy in the SAT improved over that in the PVT. How- 
ever, there was no significant difference in labeling accuracy between the 
PVT and Group 1 segments (tables ~ 7 and 2-8). 


3-1 


C *29 


9. When comparing machine-classified estimates with estimates based upon a 
random sample of Type 1 dots, machine-classified estimates did not improve 
upon the Type 1 dot, random sample estimates (tables 2-9 and 2-10). 
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4. RECOMMENDATIONS 


An alternate machine classification technique should be developed since the 
procedure used in this experiment did not improve upon a random sample of 
analyst-labeled, Type 1 dots. Methods should also be developed to compensat 
for the adverse effect that late planted soybeans have upon corn and soybean 
proportion-estimation accuracy. 


TV 0 ^ 
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APPENDIX D 

TEST OF GROUPED OPTIMAL AGGREGATION TECHNIQUE 


The objective of this simulation study was to conduct a simulated test with 
two sub-objectives: first, to evaluate the Multicrop Allocation Procedure 

(MAP) of H. 0. Hartley et al . (ref. 1), and second, to evaluate the Grouped 
Optimal Aggregation Technique (ref. 2). Since one of the major goals In the 
AgRISTARS program is to extend the technology developed during the Large Area 
Crop Inventory Experiment ( LAC IE) for wheat production to the estimation of 
production of several crops, the need for a MAP is apparent. 

In the MAP, the allocation problem is formatted In terms of nonlinear pro- 
gramming. The actual process used was minimization of the total sample size 
using a Lagrange Multiplier technique, subject to the constraints that the 
sample C.V.'s for each crop not exceed a given value (In this case 5 percent). 

The Grouped Optimal Aggregation Technique Is designed to Improve upon the 
aggregation scheme used in LAC IE by using a weighting scheme which combines 
contextual information (neighboring strata) with the target strata information 
by giving more weight to the proportion estimates of strata with plentiful 
data and less weight to the estimates of strata with little data. 

The simulation was performed in August and September of 1980 by A. H. Feiveson 
at the Johnson Space Center, Houston, Texas, and the methods used and results 
obtained are described in this appendix. 

D.l BACKGROUND 

The study was based on corn and soybeans acreage and production statistics for 
1978 in Iowa, Illinois, and Indiana. These three states were stratified into 
a total of 12 acreage strata, each representing the intersection of an APU 
(agro-physical unit) with a state, A total of 204 segments were then allo- 
cated to 12 strata using the MAP, with the goal of achieving a 5-percent C.V. 
for both corn and soybeans productions in the three-state region. The strata 
and number of segments allocated to each appear in figure D-l. 
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Figure D-l.- All 


Each entire state was one yield stratum. That is, yielci numbers were given 
for Iowa, Illinois, and Indiana, based on the actual yie d in 1978 for each of 
these three states. 

D.2 MONTE CARLO SIMULATION 

Three types of simulations were performed: yield, cloud cover, and 

segment-level proportion estimates. 

D.2. 1 YIELD SIMULATION 

Each time a simulation run was performed, a yield estimate was generated for 
each state. The procedure was simply to use the known yield for 1978 as the 
mean and the NOAA yield model variance ar the variance of a normal distribu- 
tion. A pseudorandom number from this distribution was then selected by the 
computer and this number was fed into the Grouped Optimal Aggregation 
Technique as the yield number. 

D.2. 2 CLOUD COVER SIMULATION 

The simulation was run using five acquisition rates, namely, 10 percent, 

25 percent, 50 percent, 75 percent, and 100 percent. For a particular acquisi- 
tion rate r, each segment was "acquired" with probability r or "not acquired" 
with probability 1-r. In this study a simple but rather unrealistic assumption 
was made that each segment would be acquired or not acquired independently of 
any other segment. Thus, the number of segments acquired in an acreage 
stratum, X, follows the following binomial distribution, where N represents the 
number of segments allocated to the stratum: 

Pr(X = x) = r x (l - r) N ” x 

x = 0,1, 2, ...,N (2.1) 

D.2. 3 PROPORTION ESTIMATE SIMULATION 

For each segment that was "acquired," a crop proportion estimate, p, was 
simulated. The expected value, p, of p was taken to be the actual stratum 
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proportion for 1978, and the variance, a 2 , was taken to be the sum of the 
classification variance and the sampling variance* The former was estimated 
from actual Landsat segments that had been worked by analysts, and the latter 
was estimated using the within-stratum variance estimation model (ref. 3). 

This second variance was estimated for each acreage stratum, while the first 
was considered constant over all strata. 

The distribution of p was a mixture of a discrete and continuous distribution 
as described below. Since Landsat segments occasionally contain none of the 
crop of interest, the establishment of p as zero or positive had to be deter- 
mined. The probability of a zero proportion estimate, say a, was taken to be 
the probability that a normally distributed random variance having mean y and 
variance o 2 would be less than or equal to zero (see figure 0-2). 

Once a was determined for the stratum, the proportion was assigned the value 
zero with probability a. If p was not zero, its value was selected randomly 
from a beta distribution with parameters a and b (chosen so that the distribu- 
tion of p, which is a mixture of a continuous and discrete distribution, would 
have mean y and variance o 2 ) . A typical beta density is depicted in figure 0-3. 

0.2.4 DESIGN OF THE EXPERIMENT 

A total of 1000 runs of the simulation were performed — 100 for each of the 
10 combinations of acquisition rate and crop type. The simulation layout is 
depicted in table 0-1. 

D .3 RESULTS OF THE SIMULATION 

In this section, the questions that the simulation was designed to address and 
the results of the simulation study are presented. 
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v = mean of simulated proportion estimate 
= stratum crop proportion in 1978 
a = probability simulated proportion estimate equals zero 



where a = standard deviation of 



Figure D-2.- Determination of the probability 
a proportion estimate is zero. 
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D.3.1 QUESTIONS ADDRESSED 

The simulation was performed In an attempt to answer the following questions: 

a. Does the MAP provide a 5-percent C.V. for production of each of the two 
crops for which the segments were allocated? 

b. Does the Grouped Optimal Aggregation Technqiue provide unbiased acreage 
and production estimates for each state and for the 3-state region? 

c. Are the variance (C.V.) estimates computered by the Grouped Optimal 
Aggregation Technique correct? 

d. Is the Grouped Optimal Aggregation Technique robust against loss of dita? 

The following sections show that the answer to each of these questions is 
affirmative. 

D.3.2 MULTICROP ALLOCATION 

Table D-2 illustrates the effectiveness of the MAP in meeting the goal of a 
5-percent C.V. for production of each crop. Note that C.V.'s are somewhat 
higher for individual states than for the 3-state region. This can be 
explained by noting that the goal of the allocation was to provide a 5-percent 

C. V. for the entire region, not for any individual state. The entries in the 
table indicate the sample C.V.'s computed from the 100 simulations on each 
crop type with 100-percent acquisitions. 

D. 3.3 UNBIASED AGGREGATIONS 

Table D-3 shows the relative bias of the aggregated production and acreage 
estimates at the state and at the 3- state level for corn and soybeans at both 
the 100-percent and 10-percent acquisition rates. Clearly, no detectible bias 
exists at the 100-percent acquisition rate, and the small bias seen for 
soybeans at the 10-percent acquisition rate could easily be due to chance. In 
fact, none of the biases are significantly different from zero (statistically) 
at any reasonable significance level. Hence, the conclusion is that no proce- 
dural bias has been detected in the Grouped Optimal Aggregation Technique. 
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TABLE D-2.- SAMPLE C.V.'s 

State Sample C.V. 

Corn 

Illinois 0.060 

Indiana .071 

Iowa .071 

All 3 states .04/ 

Soybeans 

Illinois 0.070 

Indiana .087 

Iowa .092 

All 3 states .052 

TABLE D-3.- RELATIVE BIAS OF AGGREGATED PRODUCTION ESTIMATES 


State 

Acquisition rate, 
100% 

Acquisition rate, 
10% 

Corn 


111 inois 

Indiana 

Iowa 

All 3 states 


111 inois 

Indiana 

Iowa 


- 0.001 

.000 

.001 

.000 


Soybeans 


- 0.002 
- .006 
.009 


-0.014 

- .009 

- .023 


All 3 states 
















D.3.4 VARIANCE ESTIMATES 


Table D-4 shows the average of the estimated C.V.'s computed by the Grouped 
Optimal Aggregation Technique over 100 simulations for corn and soybeans at 
100-percent acquisition rates. From this table it is apparent that the vari- 
ance estimation procedure used in the Grouped Optimal Aggregation Technique 
provides good variance (C.V.) estimates. 

D.3.5 MISSING DATA 

A consistent problem inherent in aerospace remote sensing is nonresponse due 
to cloud cover. One of the main reasons for developing the Grouped Optimal 
Aggregation Technique was to provide an improved method of handling non- 
response. It is, of course, unreasonable to expect any aggregation procedure 
to perform as well with missing data as with complete data; however, a robust 
procedure can De expected to provide C.V.'s which are approximately propor- 
tional to n' - ^^, where n is the sample size. Figures D-4 and D-5 give C.V.'s 
for production and acreage as computed from the simulation results for corn 
and soybeans over the 3-state region. Also shown is kn"^, where k is chosen 
such that kn" 1 / 2 = .05 at the 100-percent acquisition rate. These figures 
show that the Grouped Optimal Aggregation technique is quite robust against 
nonresponse. 

D.4 CONCLUSIONS 

We have seen that the MAP provides a good allocation for multiple crops 
surveys, at least in the two-crop case. The Grouped Optimal Aggregation 
Technique was seen to give unbiased acreage and production estimates, provided 
the input segment proportion estimates are unbiased. The Grouped Optimal 
Aggregation Technique gives good variance estimates, and it is seen to be 
robust against nonresponse. On the basis of this simulation study, it is 
therefore recommended that the MAP and Grouped Optimal Aggregation Technique 
be used as baseline procedures in the 1981 experiments. 
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TABLE D-4.~ AVERAGE OF ESTIMATED C.V.'S 
PRODUCED BY GROUPED OPTIMAL AGGREGATION 
TECHNIQUE AND SAMPLE C.V.'S 


State 

Sample C.V. 

Average of 
estimated C.V. 

Corn 

Illinois 

0.060 

0.053 

Indiana 

.071 

.068 

Iowa 

.071 

.064 

All 3 states 

.047 

.040 

Soybeans 

111 inois 

0.070 

0.075 

Indiana 

.087 

• 096 

Iowa 

.092 

.085 

All 3 states 

.052 

.053 
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